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Preface 



Topics in Applied Probability is meant to supplement the collection Applied Probability 2 with examples, 
further problems, and m-files. This collection is not meant to stand alone but rather serve as a miscellany of 
further content not included in the original collection. Topics discussed in this collection include Martingale 
Sequences and Markov Procedures as well as a few others. In addition, the collection of m-files contains both 
the user-defined programs and a collection of m-files for specific problems with properly formatted data. 
MATLAB Files 

These files can be entered into MATLAB by calling the appropriate file. These m-files come from a variety 
of sources ( e.g., exams or problem sets, hence the odd names) and may be useful for examples and exercises. 
In order to allow for easy access to the helpful m-files, a supplementary link has been placed in the upper 
right side of each module in the "Topics in Applied Probability" collection. Selecting this link will launch 
the download of a zip document containing a suite of m-files. Unzipping this file and saving its contents to 
your local system will allow for easy and reliable access when working with MATLAB. 



1 This content is available online at <http://cnx.Org/content/m31871/l.2/>. 
2 Applied Probability <http://cnx.org/content/coll0708/latest/> 



Cost with Price Breaks 3 



Let D be the demand random variable. Suppose there is 

A fiat fee of C for D < a 1 
A cost of Cj per unit for ai < D < a 2 
A cost of c 2 per unit for a 2 < D < a^ 
A cost of eg per unit for 0,3 < D 

Then (see figure) 
g(t) = C+c 1 I Ml (t)(t- a 1 ) + (c 2 -c 1 )I M2 (t) (t - a 2 ) + (c 3 - c 2 ) I Ms (t) (t - a 3 ) where M t = [a t , 00) (1) 

note: Since (t — ai) = at t = ai, we could as well have Mi = (ai, 00). 

Special Case. If D is Poisson Qu), we may obtain an exact solution for E [g (D)] by using the fact that 
E [l Koo) (£>)] = P (D > m) and 

CO h. CO 1. 

S [J Koo) (£>) D] = c-" £) fc|j- = Me"" £ |f=^P>m-l) (2) 

k—rn k—'m—l 

As an alternate approach, we approximate D by an appropriate number of values. 

Exercise 1: Example 

A residential College is planning a camping trip over Spring Recess. The number D of persons 
planning to go is assumed to be Poisson (15). Arrangements for transportation and food have been 
made as follows: 

If D < 4, a minimal flat fee of $450 will be charged. 

If 4 < D < 19, the additional charge is $100 for each person more than 4 (but less than 20). 

If 19 < D, the charge is $80 for each person more than 19. 

Thus, C = 450, ai = 4, a 2 = 19, ci = 100, and c 2 = 80. The distribution for D is Poisson (15). 
Then 

Y = g(D) = 450 + 100/ [4lOo) (£>) (D - 4) + (80 - 100) 7 [19i<so) (£>) (£> - 19) (3) 

= 450 + 100/ [4lOO ) (£>) £> - 207[i 9iOo) (I?) 79 - 400/ [4!Co) (79) + 3807 [19iOo) (79) (4) 

a. Obtain the "exact" solution to E [g (D)}. 

b. Approximate D by terms up to 40 and obtain E [g (D)] and P (g (D) > v) for v = 1000, 1500, 
2000, 2500. 



3 
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Order Statistics 4 



In sampling statistics (see Sec 18.7), we deal with an iid class {Xi : 1 < i < n) of random variables, where 
n is a prescribed positive integer known as the sample size. An observation of this class gives an n-tuple of 
numbers (£i,£2>" ■ ■ >0- As an extension of the extreme values in the case of two variables (Ex 2.11), it is 
often useful to define a random variable Yj whose value for any uj is the smallest of the X, (w); a second 
random variable Y2 whose value at u> is the next smallest of the Xi (u>), and so on through Y n whose value 
at uj is the largest of the Xi (to). We would like to be able to obtain the distributions for these new random 
variables in terms of the common distribution for the X;. We formulate the problem as follows. 

Example : Order Statistics 

Suppose {Xi : 1 < i < n) is iid, with common distribution function F. Let 

• Y\ = smallest of X\ , X2 ,..., X n 

• Y2 = next larger of X\ , X2 ,..., X n 

• ... 

• Y n = largest of X 1} X 2 , -,X n 

Then Y^ is called the icth order statistic for the class {Xi : 1 < i < n}. We wish to determine 
the distribution functions F^ (t) = P (Yfc < t) 1 < k < n. Now, Yj. < t iff k or more of the X; have 
values no greater than t. We may view the process as a Bernoulli sequence of n trials. There is a 
success on the ith trial iff Xi < t. The probability p of a success is p = P (X < t) = F (t). Hence 

n 

F k (t) = P(Y k <t) = P{k or more of the X; lie in {-inf, t]) = ^ C (n, j) F j (t) [1 - F (t)] n ~ j (1) 

j=k 

Remark. Once the common distribution function F for the X; is known, then the Ft are calculated in a 
straightforward manner. For that purpose we may use the MATLAB function cbinom. 

Example 

Suppose the X; are exponential (2). Then Fx (t) = 1 — e~ 2t for positive t. Suppose n = 5. We 
calculate F k (t) for t = 0.1, 0.3,0.5, 0.7,0.9. 



n = 5; 










t = 0.1:0.2:0.9; 










m = length (t) ; 










F = 1 - exp(-2*t) ; 










for i = l:m 










FK(i,:) = cbinom(n,F(i) ,l:n) ; 










end 










disp([t' F' FK]) '/. k = 1 


k = 2 


k = 3 


k = 4 


k = 5 


0.1000 0.1813 0.6321 


0.2249 


. 0445 


0.0046 


0.0002 
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0.3000 


0.4512 


0. 


,9502 


0. 


.7456 


0.4091 


0.1324 


0.0187 


0.5000 


0.6321 


0. 


,9933 


0. 


,9354 


0.7364 


0.3946 


0.1009 


0.7000 


0.7534 


0. 


,9991 


0, 


,9852 


0.9000 


0.6400 


0.2427 


0.9000 


0.8347 


0. 


,9999 


0. 


,9968 


0.9653 


0.8064 


0.4052 



The following special case is important in characterizing the Poisson process (see Sec 21.1). 

Example 

Order statistics for uniformly distributed random variables 

Suppose {Ui : 1 < i < n} is iid, uniform on (0, T\. Determine the distribution functions for the 
order statistics. 

SOLUTION 

The common distribution function for the [/,- is given by F (t) = t/T, < t < T. According 
to the result in Ex 2.16, the icth order statistic Y^ has the distribution function 

F k (t) = P(Y k <t) = Y,C(n,j)(-) ( \ <t <T (2) 

j=k V / \ / 



Chapter 1 

Martingale Sequences 



1.1 Martingale Sequences: The Concept, Examples, and Basic 
Patterns 1 

1.1.1 The concept, examples, and basic patterns 

1.1.1.1 A classical example 

The notion of martingales and related concepts seem to have originated in studies of games of chance similar 
to the following. Suppose 

Y n = a gambler's "gain" on the nth play of a game 
Y"o = the original capital or "bankroll" 

Set X n = for n < 0, X n = J2t=o YkVn > 0. Thus, X n is the capital after n plays, and 

Y = X Q Y n = X n - X n _! V n > (1.1) 

Put U n = (X , Xi, • • • , X n ) and V n = {Y a ,Y x , • • • , Y n ). For any n € N, U n = g n (V n ) and V n = h n (U n ) 
or, equivalently, a ([/„) = a (V n ). Hence E [H\U n ] = E [H\V n ] a.s. 

If Yjv is an independent class with E [Y n ] = OVn > 1, the game is considered fair. In this case, we have 
by (CE5), (CE7), and hypothesis 

E [X n+1 \U n ] = E [Y n+1 \V n ] + E [X n \U n ] = E [Y n+1 ] + X n = X n a.s. (1.2) 

Also E [X n+1 - X n \U n ] = E [Y n+1 \V n ] = a.s. 

Gamblers seek to develop a "system" to improve expected earnings. We examine some typical approaches 
and show their futility. To keep the analysis simple, consider a simple coin-flipping game. Let 

Hk = event of a "head" on the irth component trial 
Tk = Hfc = event of a "tail" on the icth component trial 

The player has a system. He decides how much to bet on each play from the pattern of previous events. 
Let B n [Ih„ — It„] = B n Z n be the result of the nth play, where \B n \ is the amount of the bet; B n > 
indicates a bet on a head; B n < indicates a bet on a tail; B = indicates a decision not to bet. 
Systems take various forms; here we consider two possibilities. 



1 This content is available online at <http://cnx.Org/content/m31076/l.3/>. 
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1. The amount of the bet is determent! by the pattern of outcomes of previous tosses 

B n = g n -i (-Ttfi; Ih 2 , ■ ■ ■ , lH n _ 1 ) Y n = B n Z n Z n = In n — It„ = 2Ih„ — 1 (1-3) 

2. The amount bet is determined by the pattern of previous payoffs 

B n = g n -i (Yi, Y 2 , ■ ■ ■ , Y n _\) = h n -i (£>i, Ih 1 , ••• , B n _ilH n _ 1 ) Y n = B n Z n (1.4) 

Let Yq = Xo = C, a constant. Since C is independent of any random variable, E[H\C] = E [H]. In either 
scheme, by (CE8), (CI5), and the fact E [Z k ] = 

E[Y n+1 \V n ] = E[B n+1 Z n+1 \<f> n (B 1 , I Hl , ••• , B n I H J\ = B n+1 E[Z n+1 ] = a.s. (1.5) 

It follows that 

E [X n+1 \U n ] = E [Y n=1 \v n ] + E [X n \U n ] = + X n a.s. (1.6) 

The "fairness" of the game is not altered by the betting scheme, since decisions must be based on past 
performance. In spite of simple beginnings, the extension and analysis of these patterns form a major 
thrust of modern probability theory. 

1.1.2 Formulation of the concept 

In the following treatment, 

Jn = {X n : n S N} is the basic sequence N = {0, 1, • • • } 
Yjsi = {Y n : n G N} is the incremental sequence 

n 

Y n = X n - X„_a X n = ]T Y k n > 0, [X n = n < 0] (1.7) 

fe=0 

We suppose Zjv is a decision sequence and X-yj ~ Z-^; that is, X n = g n (W n ) = g n (Z , Zi, ■ ■ ■ , Z n ). 

• Xn ~ Zn ifi in ^ Zn 

• If Xn ~ Hn and Hn ~ Zn, then Xn ~ Zn- In particular, if H n = k n (U n ) = k n (X , X\, • • • , X n ), 
then Xn ~ -H^n- 

Definition. If Xjv is integrable and Zjv is a decision sequence, then 

1. Xjv is a martingale (MG) relative to Zjv iff 

Xn ~ ^n and E [X n+ i|W n ] = X n a.s. V n e N (1.8) 

2. Xjv is a submartingale (SMG) relative to Zjv iff 

X N ~ Zn and £ [X n+ i|W n ] > X n a.s. V n e N (1.9) 

3. Xjv is a super martingale (SRMG) relative to Zjv iff 

X N ~ Zn and E [X n+1 \W n ] < X n a.s. V n e N (1.10) 
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Notation. When we write (Xn, Zn) is a martingale (submartingale, supermartingale), we are asserting Xjv 
is integrable, Zjv is a decision sequence, In ~ Zn, and Xjv is a MG (SMG, SRMG) relative to Zjv. 
Definition. If Yjv is integrable and Zjv is a decision sequence, then 

1. Yjv is absolutely fair relative to Zjv iff 

Y N ~ Z N and E [Y„+i|W n ] = a.s. VneN (1.11) 

2. Yjv is favorable relative to Zjv iff 

Y N ~ Z N and E [Y n+ i|W n ] > a.s. VneN (1.12) 

3. Yjv is unfavorable relative to Zjv iff 

Y N ~ Z N and E [Y„+i|W n ] < a.s. VneN (1.13) 

Notation. When we write (In, Zn) is absolutely fair (favorable, unfavorable), we are asserting Y/v is 
integrable, Zjv is a decision sequence, Yn ~ Zn, and Yjv is absolutely fair (favorable, unfavorable) relative 
to Zjv. IXA2-2 

Theorem 1.1: IXA2-1 

If Xjv is a basic sequence and Yjv is the corresponding incremental sequence, then 

1. (Xn, Zn) is a martingale iff (Ysr, Zn) is absolutely fair. 

2. (Xn, Zn) is a submartingale iff (Ysr, Zn) is favorable. 

3. (Xn, Zn) is a supermartingale iff (In, Zn) is unfavorable. 

Proof: 

Let * be any one of the symbols =, >, or <. Then by linearity and (CE7) 

E[X n+1 \W n ] = E[Y n+l \W n ] + E[X n \W n ] = E[Y n+1 \W n ] + X n * (1.14) 
X n a.s. iff E[Y n+l \W n ] * a.s. 

Remarks 

1. (X N , Z n ) is a SMG iff (-X N , Z N ) is a SRMG 

2. We write (S)MG to indicate the same statement can be made for a MG or a SMG with the appropriate 
choice of = or > 

3. We write (>) to indicate simultaneously two cases: 

• (>) read as = in all places (for a MG) 

• (>) read as > in all places (for a SMG) 

1.1.3 Some Basic Patterns 

Theorem 1.2: IXA3-1 

If (X N , Z n ) is a (S)MG and X N ~ iJ N , with H N - Z N , then (X N , iJ N ) is a (S)MG. 
Proof: 
Let K n = {H , Hi, ■■■ , H n ). By (CE9), the (S)MG definition, monotonicity, and (CE7) 

E [X n+1 \K n ] = E{E [X n+1 \W n ] \K n ) (>) E [X n \K n ] = X„ a.s. (1.15) 
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Theorem 1.3: IXA3-2 

For integrable JCn ~ Zn, the following are equivalent 
a (A N , Z N ) is a (S)MG 

b E[X n+k \W n ] (>) X n a.s. V n, fceN 

c £[1^+!] (>)£[/ c IJ V C£a(W n ) 

V ne N 

d £[/ c A„ +fe ] (>) E[I c X n ] V Cea(W„) 

V n, k £ N 
Proof: 

b => a: as a special case 

a => b: By (CE9), (a), and monotonicity 

E [X n+k \W n ] = E{E [X n+k \W n+k ^] \W n } (>) E [l n+ n|lf„] a.s. (1.16) 

fc— 1 iterations yield i£ [X„ + /;|W„] (>) X n a.s. 
d => c: as a special case 
c =► a: By (CE1) and (c), £ [I c A n+1 ] = £{/ c £ [A n+1 |Wg} (>) £ [I c X n ] a.s.. Since 

A„ ~ W n a.s. and E 1 [A n+ i|W„] ~ W n a.s. , the result follows from the uniqueness property 

(E5) 
b =* d: By (CE1), (b), and monotonicity £ [I C X n+k ] = E{I C E [X n+k \W n ]} (>) E [I C X n ] 

We thus have d=>c=>a<^>6=>rf 

Corollary 1.1: IXA3-3 

If (X N , Z n ) is a (S)MG, then £ [A„+ fe ] (>) £ [A„] (>) £ [A ] 

Theorem 1.4: IXA3-4 

(A N , Z n ) is a (S)MG iff E [X q - X p \W n ] (>) a.s. V n < p < q £ N 
Proof: 
EXERCISE. Note A g - A p = Y p+ i + • • • +Y q 

IXA3-2 
Theorem 1.5: IXA3-5 
If (A N , Z n ) is an L 2 MG, then 
E [X q -X p ]=0 
E [X n (A, - X p )] = 
S [(A„ - X m ) {X q - X p )] = 
E [X p X q ] = E [A 2 Ag ] 
E [(X q - X p f] = E [A 2 ] - E [A 2 ] > V 

E[XZ]=ZU E[Y*] 
Proof: 

a. E [X q - X p ] = E{E [X q - X p \W n ]} = by (CElb) and Thm IXA3-4 

b. E [A„ (X q - A p )] = E{X n E \X q - X p \W n ]} = by (CElb), (CE8), and Thm IXA3-4 

c. As in b, since A„ — X m ~ W n 



V 


p < q e N 


V 


n < p < q £ N 


V 


m<n<p<g£N 


V 


p, q£N 


V 


p < q G N 


V 


pe N 
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d. Suppose p < q. Then, since X p ~ W p , E [X p X q ] = E{X p E [X q \W p ]} = E [Xf\ by definition 
of MG. For q < p, interchange p, q in the argument above. 

e. E [(X q - X p f] = E [Xl] - IE \X p X q ] + E [X*] = E [X^] - 2E [X p 2 ] + E [X 2 ] by d, above 

f. By c, E[Y 3 Y k ] = for j / k. Hence, E [X 2 ] = E [(ELo^)'] = T, j T, k E[Y j Y k ] = 

ELo^K] 

A variety of weighted sums of increments are useful. 

Theorem 1.6: IXA3-6 

Suppose (Xn, Zn) is a (S)MG and Yjv is the incremental sequence. Let H be a (nonnegative) 
constant and let H n ~ W n -\, n > 1, be bounded (nonnegative). Set 

n n 

X* n = Y J H k Y k = Y J Yk VneN (1.17) 

k=0 fe=0 

Then (X^, Z N ) is a (S)MG. 
Proof: 

E [Y r : +1 \W n ] =E[H n+1 Y n+1 \W n ] = H n+1 E [Y n+1 \W n ] a.s. by (CE8) 
For MG case: E [Y* +1 \W n ] = a.s. for arbitrary bounded H n 
For SMG case: E [Y* +1 \W n ] > a.s. for H n > 0, bounded 
The conclusion follows from Theorem 1.1, IXA2-1, p. 9 

Remark. This result extends the pattern in the introductory gambling example. Theorem 1.4, IXA3-4, 
p. 10 IXA3-3 

Theorem 1.7: IXA3-7 

In Theorem IXA3-6 (Theorem 1.6, IXA3-6, p. 11), if E [X ] > and < H n < la.s.Vn G N, then 
< E [X*] < E [X n ] VneN 
Proof: 

E[Y n+1 \W n ] > H n+1 E[Y n+1 \W n ] (>)0a.s., by hypothesis, and H n+1 E [Y n+1 \W n ] = 
E {Y* +l \W n \ a.s. , by (CE8). Thus, by monotonicity and (CElb) 

E[Y n+1 ] (>)E[Y* +1 \ (>) VneN and E [Y ] = E [X Q ] > H Q E [Y ] = E [Y*\ (1.18) 

Hence 

n n 

E [X n ] = Y J E\Y k ]>Y J E \Y k *} = E [X* n ] > (1.19) 

fc=0 fe=0 



Some important special cases 

Theorem 1.8: IXA3-8 

Suppose integrable AT N ~ Z N . If X n+1 - X n (>) Oa.s.Vn e N, then (X N , Z N ) is a (S)MG. 
Proof: 
Apply monotonicity and Theorem IXA3-4 (Theorem 1.4, IXA3-4, p. 10) 

Theorem 1.9: IXA3-9 

Suppose Xjv has independent increments. 

a. If E [X n ] = c, invariant with n, then Xjv is a MG. 

b. If E [X n+1 - X n ] (>) 0, Vn G N, then (AT N is a (S)MG. 
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Proof: 

b. For any n, consider any C € a(U n ). By independent increments, {Ic, (X n +i — X n )} 
is independent. Hence, E [I c X n+1 ] - E [I c X n ] = E [I c {X n+1 - X n )] = 

E[I c }E[(X n+1 -X n )](>)0. The desired result follows from Theorem IXA3-2(c) (Theo- 
rem 1.3, IXA3-2, p. 10). 

Theorem 1.10: IXA3-10 

Suppose g is a convex Borel function on an interval I which contains the range of all X n and 
E [\g (X n ) |] < ooVn e N, Let H n = g (X n ) Vn e N, 

a. If (X N , Z N ) is a MG, then (H N , Z N ) is a SMG. 

b. If g is nondecreasing and (Xn, Zn) is a SMG, then so is (i?N, ^n) 

Proof: 

a. : By Jensen's inequality and the definition of a MG 

E[g(X n+1 )\W n ]>g(E\X n+1 \W n ]) = g(X n ) a.s. (1.20) 

b. : By Jensen's inequality 

E[g(X n+1 )\W n ]>g(E[X n+1 \W n ]) a.s. (1.21) 

Since E [X n+ i\Wn\ > X n a.s. and g is nondecreasing, we have 

g(E[X n+1 \W n ])>g(X n ) a.s. (1.22) 

Some commonly utilized convex functions 

1. g(t) = \t\ 

2. g (t) = t 2 g is increasing for t > 

3. g (t) = u (t) t g (X n ) = X+g nondecreasing for all t 

4. g (t) = — u (—t) tg (X n ) = X~g nonincreasing for all t 

5. g (t) = e at , a > Og is increasing for all t 

Theorem 1.11: IXA3-11 

Consider integrable X^ ~ Zn- 

a. If E [X n+ i\W n ] = aX n a.s.\/n and X* n = -^X n \/n, then (AT N , Z N ) is a MG 

b. If E [X n+ i\Wn] > aX n a.s.,a > 0, Vn and°X; = ^X„Vn, then (X N , Z N ) is a SMG 



Proof: 



E [X* n+1 \W n ] = -^E\X n+l \W n ] (>) ~^aX n = X* n a.s. (1.23) 



The restriction! a > is needed in the > case. 
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1.2 Martingale Sequences: Examples and Further Patterns 2 

1.2.1 Examples and further patterns 

Theorem 1.12: A4-1 Sums of Independent Random Variables 
Suppose Yjv is an independent, integrable sequence. Set X n = X^fc=n ^k V n > 0. 
If E [Y n ] (>) V n > 1, then X N is a (S)MG. 

Theorem 1.13: A4-2 Products of nonnegative random variables 
Suppose Yn ~ Zn, Y„ > Oa.s.Vn. Consider Xn : X n = cf|^ =0 Y^, c > 0. 

If E [Y n+1 \Wn] (>) 1 a.s. V n, then (X N , Z N ) is a (S)MG 
Proof: 
X n ~ W„ and X n+ i = y„+iX n . Hence, £ [X n+1 |W„] = X n E [Z n+1 \W n ] (>) X n a.s. V n 

Theorem 1.14: A4-3 Discrete random walk 

Consider Yq = and {Y n : 1 < n} iid. Set X n = J]fc=o YkVn > 0. Suppose P (Y n = k) = p^. Let 



5y ( S ) = S[s y "] = 5> fe s fe , s>0 (1.24) 

k 

Now 5y (1) = 1, g Y (1) = E [Y n ] , <£ (s) = J2k k ( k ~ ^PkS k ~ 2 > for s > 0. Hence, 5y («) = 1 
has at most two roots, one of which is s = 1. 

a. s = 1 is a minimum point iff E [Y n ] = 0, in which case Xjv is a MG (see A4-1 (Theorem 1.12, 
A4-1 Sums of Independent Random Variables, p. 13)) 

b. If g Y (r) = 1 for < r < 1, then E [r y »] = lVn > 1. Let Z = 1, Z„ = r x " = n£=i rY " ■ B y 
A4-2, Zjv is a MG 

For the MG case in Theorem IXA3-6, the Y n are centered at conditional expectation; that is 
E [Y n+ i\W n ] = a.s. The following is an extension of that pattern. 

Theorem 1.15: A4-4 More general sums 

Consider integrable Yn ~ ^n and bounded i?N ~ ^n- Let W n = a constant for n < and H n = 1 
for n < 0. Set 

n 

X n = J2( Yk ~ E [n|W fe _i]}if fe _i V n > (1.25) 

fe=0 

Then (X N , Z N ) is a MG. 
Proof: 

X n ~ W n ; Vn > and £ [X n+ i|W„] = X n + H n E{Y n+1 - E [Y n+1 \W n ] \W n } = X n + Oa.s. 

IXA4-2 

Theorem 1.16: A4-5 Sums of products 

Suppose Yjv is absolutely fair relative to Zjv, with E [|Y„| fc ] < ooVn, fixed k > 0. For n> k, set 

In= J] Y n Y 2 ---Y fc ~G n (1.26) 

0<ii<---<n 

Then {X Nk , Z Nk ) N k = {k, k + 1, k + 2, • • • } is a MG, 



2 This content is available online at <http://cnx.Org/content/m31067/l.3/>. 
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Proof: 

X n+ i = X n + K n+ i, where 

K n+1 = Y n+1 Y, ^^■■■Y lk _ 1 =Y n+1 K* n K* n ~W n (1.27) 

0<ii<---<n 

E[K n+1 \W n } = K* n E[Y n+1 \W n } = Oa.s. Vn>fc (1.28) 



We consider, next, some relationships with homogeneous Markov sequences. 

Suppose (Xn, -Zn) is a homogeneous Markov sequence with finite state space E = {1, 2, ■•• , M} 
and transition matrix P = \p(i, j)]. A function f on E is represented by a column matrix f = 
[/ (1) , / (2) , • • • , / (M)] . Then / (X n ) has value / (A;) when X n = k. Pf is an m x 1 column matrix 
and Pf (j) is the jth element of that matrix. Consider E [f (X n+ i) \W n ] = E[f (X n+ \) \X n ] a.s. . Now 

E[f(X n+1 )\X n =j} = J2f(k)p(J,k) = Pf(j) so that E[f(X n+1 )\W n ]=Pf(X n ) (1.29) 

fees 

A nonnegative function f on E is called (super)harmonic for P iff Pf (<) /. 

Theorem 1.17: A4-6 Positive supermartingales and superharmonic functions. 

Suppose (Xn, Zn) is a homogeneous Markov sequence with finite state space E = {1, 2, • • • , M} 
and transition matrix P = [p(i, j)}. For nonnegative f on E, let Y n = f (X n ) V n € N. Then 
(In, ^n) is a positive (super) martingale P(SR)MG iff f is (super) harmonic for P. 
Proof: 

As noted above E [f (X n+1 ) \W n ] = Pf (X n ). 

1. If f is (super) harmonic Pf (X n ) (<) / (X„) = F„, so that 

E[r n+1 |W„] (<) y„ a.s. (1.30) 

2. If (Yn, Z n ) is a P(SR)MG, then 

Y n = f (X n ) (>) E [/ (X„ +1 ) | W n ] = P/ (X„) a.s. , so that / is (super)harmonic (1.31) 



IX A4-3 

An eigenfunction f and associated eigenvalue A for P satisfy Pf = Xf (i.e., (XI — P) f = 0). In most 
cases, |A| < 1. For real A, < A < 1, the eigenfunctions are superharmonic functions. We may use the 
construction of Theorem IXA3-12 to obtain the associated MG. 

Theorem 1.18: A4-7 Martingales induced by eigenfunctions for homogeneous Markov sequences 
Let (Yn, -^n) be a homogenous Markov sequence, and f be an eigenfunction with eigenvalue A. 
Put X n = X~ n f (Y n ). Then, by Theorem IAXA3-12, (X N , Z N ) is a MG. 

Theorem 1.19: A4-8 A dynamic programming example. 

We consider a horizon of N stages and a finite state space E = {1, 2, • • • , M}. 

• Observe the system at prescribed instants 

• Take action on the basis of previous states and actions. 

Suppose the observed state is j and the action is a £ A. Two results ensue: 
1. A return r(j,a) is realized 
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2. The system moves to a new state 

Let: 

Y n = state in nth period, < n < N — 1 

A n = action taken on the basis of Yq, Aq, ■ ■ ■ , Y n _\, A n _i, Y n 

[Ao is the initial action based on the initial state Yo] 

A policy tt is a set of functions (no, tt\, • • • , 7T/v-i), such that 

A n = ir n {Y , A , ■■■ , y„_i, A„_ 1; Y n ) 0<n<N-l 
The expected return under policy tt, when Yo = jo is 

rjv-i 



(1.32) 



R(ir,jo) = E 



5>(y fc ,40 



fc=0 



(1.33) 



The goal is to determine tt to maximize R(tt, jo). 

Let Z k = (Y k , A k ) and W n = (Z , Z x , ■■■ , Z n ). If {Y k : < k < N - 1} is Markov, then use 
of (CI9) and (CI11) shows that for any policy the Z-process is Markov. Hence 



E [I M (Y n+1 ) | W n ] = E [I M (y„+i) | Z n ] a.s. Vn: < n < N - 1, V Borel sets M 
We assume time homogeneity in the sense that 

P (Y n+1 = j\Y n = i, A n = a) = p (j\i, a) , invariant with n, V i, j € E, V a s A 

We make a dynamic programming approach 
Define recursively /jv, /n-i, • ■ ■ , /o as follows: 
/jv (?) = 0. V i e E. For n = N, N - 1, • • • , 2, 1, set 

fn-i (j) = max {r (j, a) + ^ /„ (fc) p (fc| j, a) : a e A} 



(1.34) 
(1.35) 



feeE 



Put 



X n = J2ifk (Yk) - E \f k (Y k ) |W fc _i]} 



(1.36) 



(1.37) 



fc=i 



Then, by A4-2 (Theorem 1.13, A4-2 Products of nonnegative random variables, p. 13), (X^, Zn) 
is a MG, with E [X n ] = 0, < n < N and 

fn-i (Y n -i) > r (y n _! , ^„_i) + ^ /„ (fc) p (&|Z„_i) = r (y n _! , An-i) + E [/„ (y„) I W„_i] (1.38) 

feeE 

IX A4-4 

We may therefore assert 



o = e [x N \ = e (e£Li{A (n) - £ [/* (n) i^-i]}) > (i-39) 
e (Ef=i{/ fc (n) + *■ (n_i, A fc _o - / fc _! (n_i)}) 



£ 



JV-l 



^r(y fe , A fe ) + /^(yv)-/ (y ) 



Lfc=o 



E 



N-l 



z 2 r ( Y k ,A k 



Lfe=o 



S[/o (Xo)] 



(1.40) 
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Hence, R(n, Y ) < E[f (Y )]. For Y = j , R(n, j ) < /o (jo)- If a policy tt* can be found 
which yields equality, then n is an optimal policy. 

The following procedure leads to such a policy. 

• For each j e E, let 7r*_ : (Yq, Aq, Y\, A\, • • • , A„_2, j) = tt^_i (j) be the action which maximizes 

r (j, a) + Y,fn (k)p(k\j, a) = r (j, a) + E [/„ (Y n ) \Y n ^ = j, A n ^ = a] (1.41) 

feGE 

Thus, a; = < (y„). 

• Now, / n _i (l^n-i) = t (Y n _i, ^4^_i) — -E [/„ (Y n ) |^_i], which yields equality in the argument above. 
Thus, i? (-7T*, j) = /o (j) and 7r is optimal. 

JVote that jt is a Markov policy, A* n = 7r* (F n ). The functions f n depend on the future stages, but once 
determined, the policy is Markov. 

Theorem 1.20: A4-9 Doob's martingale 

Let X be an integrable random variable and Zjv an arbitrary sequence of random vectors. For each 
n, let X n = E [X\W n ]. Then (X N , Z N ) is a MG. 
Proof: 

E [\X n \] = E{\E [X\W n ] |} < E{E [\X\ \W n ]} = E [\X\] < oo (1.42) 

E [X n+1 \W n ] = E{E [X\W n +i] \W n } = E [X\W n ] = X n a.s. (1.43) 



Theorem 1.21: A4-9a Best mean-square estimators 

If X € L 2 , then X n = E[X\W n ] is the best mean-square estimator of X, given W n = 
{Z , Z 1} • • • , Z n ). (X N , Z N ) is a MG. 

Theorem 1.22: A4-9b Futures pricing 

Let Xjv be a sequence of "spot" prices for a commodity. Let to be the present and to + T be a 
fixed future. The agent can be expected to know the past history Ut = {Xq, X\, ■ ■ ■ , X to ), and 
will update as t increases beyond to- Put Yk = E [X to +T\Ut +k], the expected futures price, given 
the history up to to + k. Then {Yk : < k < T} is a Doob's MG, with Y = X to +T, relative to 
{Z k : < k < T}, where Z = U to and Z k = X to+k for 1 < k < T. 

Theorem 1.23: A4-9c Discounted futures 

Assume rate of return is r per unit time. Then a = 1/ (1 + r) is the discount factor. Let 

V k = E [a T - k X to+T \U to+k ] = a T ~ k Y k (1.44) 

Then 

E [V k+1 \U ta+k ] = a T - k E [Y k+1 \U to+k ] = a T - k - x Y k > a T ~ k Y k = V k a.s. (1.45) 

Thus {V k : < k < T] is a SMG relative to {Z k : < k < T}. 

Implication from martingale theory is that all methods to determine profitable patterns of prediction 
from past history are doomed to failure. 
IX A4-5 
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Theorem 1.24: A4-10 Present discounted value of capital 

If a = 1/ (1 + r) is the discount factor, X n is the dividend at time n, and V n is the present value, 
at time n, of all future returns, then 

oo oo oo 

Vn = J2 akx n+k so that V n+1 = Y^ a k X n+k+1 = ^ a fe_1 X„ +/s (1.46) 

fe=l fe=l k=2 

1 oo 

- V a fe X„ +fc - X n+1 = (1 + r) V n - X n+1 (1.47) 

r\ *■ — * 



a 

fe=i 



Note that V n+1 {>) V n iff r (>) X n+1 /V n . Set y n = £[T4|£/„]. Then Y n+1 = 
{l + r)E [V n \U n+1 ] - X n+1 a.s. so that 

E [Y n+1 \U n ] = (l + r)Y n -E [X n+1 \U n ] (1.48) 

Thus, (y N , X N ) is a (S)MG iff 

i?[X„ + i|C/„] Expected return next period, given U n 
~ E[V n \Un\ Expected present value, given U n 



1.2,2 Summary: Convergence of Submartingales 

The submartingale convergence theorem 

Theorem 1.25: 

If (Xn, Zn) is a SMG with limE [X+] < oo, then there exists Xoo ~ W^ such that X n — > Xoo a.s. 

n 

Uniform integrability and some convergence conditions 
Definition. The class {X t : t G T} is uniformly integmble iff 

sup{£[/ { | Xt | >a} |X t |] : (eT}^0 asa^oo (1.50) 

Theorem 1.26: 

Any of the following conditions ensures uniform integrability: 

1. The class is dominated by an integrable random variable Y. 

2. The class is finite and integrable. 

3. There is a u.i. class {Y t : t€ T] such that \X t \ < \Y t \ a.s. for all t e T. 

4. X integrable implies Doob's MG {X n = E [X\W n ] : n e N} is u.i. 

Definition. The class {X t : t s T} is uniformly absolutely continuous iff for each e > there is a 5 > 
such that P (A) < 6 implies sup{E [I A \X t \] : t e T} < e. 

T 

Theorem 1.27: 

X T = {X t : t e T} is u.i. iff both (i) X T is u.a.c, and (ii) sup{E [\X t \]} < oo. 

T 

Definition. X n -> X iff P (\X n - X\ > e) — > as n — > oo for all e > 0. 

X„ ^ X iff S[|X„- X| p ] -> Oasrn oo (1.51) 

Theorem 1.28: 

p 

X n — > X a.s. implies J n — > X 
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Theorem 1.29: 

If (i) X n — * X, (ii) X n — » X a.s. and (iii) /zm_E [X n \Z] exists a.s., then 

n 

UmE[X n \Z] =E[X\Z] a.s. (1.52) 

n 

Theorem 1.30: 

Suppose (Xn, Zn) is a (S)MG. Consider the following 

(A) : limE [X+] < oo or, equivalently, supE [\X n \] <oo 

" n 

(a) : Xjv is uniformly integrable. 
(a + ) : X^ is uniformly integrable. 

(b) : X n £ X. 

(b+) : X+ ^ X. 

(c) : There is an integrable Xoo ~ Woo such that 

Xn^Xooa.s. and ^[XoolWn] (>) X„ a.s. VneN (1.53) 

(c') : Condition (c) with > even for a MG. 

(d) : There is an integrable X with E [X\W„] (>) X n a.s. VneN 
(d') : Condition (d) with > even for a MG. 

Then 

1. Each of the propositions (a) through (d) implies (A), hence SMG convergence. 

2. (a) => (a+) 

3. (a) O (b) => (c) => (d) 

4. (a+) o (b+) «• (c') o (d') 

5. For a MG, (d) => (a), so that (a) O (b) O (c) O (d) 

The notion of regularity is characterized in terms of the conditions in the theorem. 

Definition. A martingale (Xn, Zn) is said to be martingale regular iff the equivalent conditions (a), 
(b), (c), (d) in the theorem hold. 

A submartingale (Xn, Zn) is said to be submartingale regular iff the equivalent conditions (a + ), 
(b + ), (c), (d) in the theorem hold. 

Remarks 

1. Since a MG is a SMG, a martingale regular MG is also submartingale regular. 

2. It is not true, in general that a submartingale regular SMG is martingale regular. We do have for SMG 
(a) o (b) =* (c) =* (d). 

3. Regularity may be viewed in terms of membership of X^ in the (S)MG. The condition 
E [XqoIWjj] (>) X n a.s. is indicated by saying X^ belongs to the (S)MG or by saying the (S)MG 
is closed (on the right) by X^. 

Summary 

For a martingale (Xn , Zn) 

1. If martingale regular, then X„ — > Xoo ~ Wooa.s. and X„ = E [X^Wn] a.s^n e N_E[X n+ fc|W n ] = 
£{£ [XoolWn+k] |W„} = E [X^Wg = X„a.s. and E [X ] = £ [X„] = £ [Xoo] Vn e N 

2. If submartingale regular, but not martingale regular, then X„ — > X^ ~ Wooa.s. but .EjXoolWjJ > 

X„a.s.Vn eNand£ [X ] = E [X n ] < E [X^] < ooVn e N 



For a submartingale(X^ , Zn) 

Either martingale regularity or submartingale regularity implies 

X n -» X^ ~ W^ o.s. and X n < E [X n+1 \W n ] < E [X TO |Wg a.*. V n e N 

and E [X ] < E [X n ] < E [X^] < oo VneN 

If Xjv is uniformly integrable, then E [X n ] — > E 1 [X^]. 

Theorem 1.31: 

If (X N , Z N ) is a MG with E [X%\ < KVnfiN, then the proceess is MG regular, with 



X n 
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X x o.s. £ [(Xoo - X n ) 2 ] ^ and £ [X n ] = E [XJ VneN (1.54) 
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Chapter 2 

Markov Procedures for Markov Decision 
Processes 

2.1 A Reorder problem- Electronic Store 1 
2.1.1 A reorder problem 

Example 2.1 

An electronic store stocks a certain type of DVD player. At the end of each week, an order is placed 
for early delivery the following Monday. A maximum of four units is stocked. Let the states be the 
number of units on hand at the end of the sales week: E = {0, 1, 2, 3, 4}. Two possible actions: 



• Order two units, at a cost of $150 each 

• Order four units, at a cost of $120 each 



Units sell for $200. If demand exceeds the stock in hand, the retailer assumes a penalty of $40 
per unit (in losses due to customer dissatisfaction, etc.). Because of turnover, return on sales is 
considered two percent per week, so that discount is a = 1/1.02 on a weekly basis. 

In state 0, there are three possible actions: order 0, 2, or 4. In states 1 and 2 there are two 
possible actions: order or 2. In states 3 and 4, the only action is to order 0. Customer demand 
in week n + 1 is represented by a random variable D n+ i. The class is iid, uniformly distributed on 
the values 0, 1, 2, 3, 4. If X n is the state at the end of week n, then {X n , D n+ i} is independent 
for each n. 

Analyze the system as a Markov decision process with type 3 gains, depending upon current 
state, action, and demand. Determine the transition probability matrix PA (properly padded) and 
the gain matrix (also padded). Sample calculations are as follows: 



State 0, action 
State 0, action 2 
State 2, action 2 



poo(0) = l(allotherp fc(0) = 
Poo (2) = P {D > 2) = 3/5, poi (2) = P (D = 1) = 1/5, etc. 

p 2j {k) = 1/5, k = 0,1,2,3,4 
For state = i, action = a, and demand = k, we seek g (i, a, k) 

g(0,0,k) = -40fc 0-40-80-120-160 

g (0, 2, jfc) = -300 + 200min{k, 2} - 40max{k - 2, 0} -300 -100 100 60 20 
g (0, 4, k) = -480 + 200fc -480 -280 -80 120 320 



1 This content is available online at <http://cnx.Org/content/m31073/l.4/>. 
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PROCESSES 

1. Complete the transition probability table and the gain table. 

2. Determine an optimum infinite-horizon strategy with no discounting. 

3. Determine an optimum infinite-horizon strateby with discounting (alpha = 1/1.02). 

4. The manager decides to set up a six-week strategy, after which new sales conditions may be 
established. Determine an optimum strategy for the six-week period. 

Data file 

'/, file orderdata.m 
'/. Version of 4/5/94 
°/. Data organized for computation 
type = 3; 
states = 0:4; 
= [0 2 4 . . . '/. Actions (padded) 

2 02 . . . 

2 02 . . . 

00 00 ... 

00 00] ; 
C = [0 -300 -480 ... '/. Order costs (padded) 
-300 -300 . . . 
-300 -300 . . . 
. . . 
0]; 

SP = 200; '/. Selling price 
BP = 40; '/. Backorder penalty 
PD = . 2*ones(l,5) ; '/, Demand probabilities 



2,1,2 Transition Probabilities and Gains 

The procedure 

'/, file reorder. m 
'/. Version of 4/11/94 

'/, Calculates PA and GA for reorder policy 
states = input ('Enter row vector of states '); 
A = input('Enter row vector A of actions (padded) '); 
C = input ('Enter row vector C of order costs (padded) '); 
D = input ('Enter row vector D of demand values '); 
PD = input ('Enter row vector PD of demand probabilities '); 
SP = input ('Enter unit selling price SP '); 
BP = input ('Enter backorder penalty cost BP '); 
m = length(states' ) ; 
q = length(A) ; 
na = q/m; 
N = length(D) ; 
S = ones (na, 1) Estates; 
S = S(:)'; 

[d,s] = meshgrid(D,S) ; 
a = A'*ones(l,N) ; 
ca = C'*ones(l,N) ; 
TA = (s + a - d) . *(s + a - d >= 0) ; 
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for i = l:q 

PA(i,:) = tdbn(states,TA(i, :) ,PD) ; 

end 

PA 

GA = ca + SP*d - (SP + BP)*(d -s -a).*(d > s+a) 

The calculations 



orderdata 
reorder 

Enter row vector of states states 
Enter row vector A of actions (padded) A 
Enter row vector C of order costs (padded) C 
Enter row vector D of demand values D 
Enter row vector PD of demand probabilities PD 
Enter unit selling price SP SP 
Enter backorder penalty cost BP BP 
PA = 



1.0000 
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Infinite-horizon strategy (no discounting) 

polit 
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CHAPTER 2. MARKOV PROCEDURES FOR MARKOV DECISION 

PROCESSES 



Data needed: 



Enter case number to show gain type case 

Enter row vector of states states 

Enter row vector A of possible actions A 

Enter value of alpha (= 1 for no discounting) 1 

Enter matrix PA of transition probabilities PA 

Enter matrix GA of gains GA 

Enter row vector PD of demand probabilities PD 

Index Action Value 

-80 

2 2 -44 

3 4 -80 

4 112 

5 2 52 

6 2 52 

7 256 

8 2 100 

9 2 100 

10 352 

11 352 

12 352 

13 400 

14 400 

15 400 

Initial policy: action numbers 
2 1111 
Policy: actions 

2 

New policy: action numbers 

3 2 2 11 
Policy: actions 

4 2 2 



Long-run distribution 
0.2800 0.2000 0.2000 0.2000 0.1200 
Test values for selecting new policy 
Index Action Test Value 



1.0000 







-248.0000 


2.0000 


2. 


0000 


-168.8000 


3.0000 


4. 


0000 


-41.6000 


4.0000 







-48.8000 


5.0000 


2. 


0000 


-5.6000 


6.0000 


2. 


0000 


-5.6000 


7.0000 







131.2000 


8.0000 


2. 


0000 


138.4000 


9.0000 


2. 


0000 


138.4000 


10.0000 







294.4000 


11.0000 







294.4000 


12.0000 







294.4000 


13.0000 







438.4000 
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14.0000 





438.4000 


15.0000 





438.4000 


Opt imum 


policy 




State Action Value 







4.0000 


-168.0000 


1.0000 


2.0000 


-132.0000 


2.0000 


2.0000 


12.0000 


3.0000 





168.0000 


4.0000 





312.0000 



Long-run expected gain per period G 
126.4000 

Infinite-horizon strategy (with discounting) 

polit 
Data needed: 



Enter type number to show gain type 

Enter row vector of states states 

Enter row vector A of possible actions A 

Enter value of alpha (= 1 for no discounting) 1/1.02 

Enter matrix PA of transition probabilities PA 

Enter matrix GA of gains GA 

Enter row vector PD of demand probabilities PD 



Index Action 


. Value 


1 -80 




2 -44 




3 -80 




4 112 




5 2 52 




6 2 52 




7 256 




8 2 100 




9 2 100 




10 352 




11 352 




12 352 




13 400 




14 400 




15 400 




Initial policy: 


action numbers 


2 1111 




Policy: actions 




2 





New policy: action numbers 

3 2 2 11 
Policy: actions 

4 2 2 

Test values for selecting policy 
Index Action Test Value 
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' ' PROCESSES 

1.0e+03 * 
0.0010 6.0746 
0.0020 0.0020 6.1533 
0.0030 0.0040 6.2776 
0.0040 6.2740 
0.0050 0.0020 6.3155 
0.0060 0.0020 6.3155 
0.0070 6.4533 
0.0080 0.0020 6.4576 
0.0090 0.0020 6.4576 
0.0100 6.6155 
0.0110 6.6155 
0.0120 6.6155 
0.0130 6.7576 
0.0140 6.7576 
0.0150 6.7576 
Optimum policy 
State Action Value 
1.0e+03 * 

0.0040 6.2776 
0.0010 0.0020 6.3155 
0.0020 0.0020 6.4576 
0.0030 6.6155 
0.0040 6.7576 

Finite-horizon calculations 

dpinit 
Initialize for finite horizon calculations 
Matrices A, PA, and GA, padded if necessary 
Enter case number to show gain type case 
Enter vector of states states 
Enter row vector A of possible actions A 
Enter matrix PA of transition probabilities PA 
Enter matrix GA of gains GA 

Enter row vector PD of demand probabilities PD 
Call for dprog 
dprog 

States and expected total gains 
12 3 4 
-44 112 256 352 400 
States Actions 

2 

1 

2 

3 

4 
dprog 

States and expected total gains 

1.0000 2.0000 3.0000 4.0000 

135.2000 178.4000 315.2000 478.4000 615.2000 

States Actions 

4 
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1 2 




2 2 




3 




4 




dprog 




States 


and expected total gains 





1.0000 2.0000 3.0000 4.0000 


264.4800 300.4800 444.4800 600.4800 744.4800 


States 


Actions 


4 




1 2 




2 2 




3 




4 




dprog 




States 


and expected total gains 


1.0000 


2.0000 3.0000 4.0000 


390.8800 426.8800 570.8800 726.8800 870.8800 


States 


Actions 





4 


1 


2 


2 


2 


3 





4 






dprog 
States and expected total gains 








1.0000 


2.0000 3. 


.0000 


4. 


,0000 


517.2800 


553.2800 


697.2800 853. 


,2800 


997 


,2800 


States 


Actions 















4 










1 




2 










2 




2 










3 















4 















dprog 














States 


and expected 


total gains 








1.0e+03 


* 















0.0010 


0.0020 0. 


.0030 


0, 


,0040 


0.6437 


0.6797 


0.8237 0. 


,9797 


1. 


.1237 


States 


Actions 















4 










1 




2 










2 




2 










3 















4 
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PROCESSES 

2.2 Markov Decision - Type 3 Gains 2 

Example 2.2 

An electronic store stocks a certain type of VCR. At the end of each week, an order is placed for 
early delivery the following Monday. A maximum of four units is stocked. Let the states be the 
number of units on hand at the end of the sales week: Eb = {0, 1, 2, 3, 4} . Two possible actions: 

• Order two units, at a cost of $150 each 

• Order four units, at a cost of $120 each 

Units sell for $200. If demand exceeds the stock in hand, the retailer assumes a penalty of $40 
per unit (in losses due to customer dissatisfaction, etc.). Because of turnover, return on sales is 
considered two percent per week, so that discount is a = 1/1.02 on a weekly basis. 

In state 0, there are three possible actions: order 0, 2, or 4. In states 1 and 2 there are two 
possible actions: order or 2. In states 3 and 4, the only action is to order 0. Customer demand 
in week n + 1 is represented by a random variable D n+ i. The class is iid, uniformly distributed on 
the values 0, 1, 2, 3, 4. If X n is the state at the end of week n, then {X n , D n+ \] is independent for 
each n. 

Analyze the system as a Markov decision process with case 3 gains, depending upon current 
state, action, and demand. Determine the transition probability matrix PA (properly padded) and 
the gain matrix (also padded). Sample calculations are as follows: 

• State 0, action 0: poo (0) = 1 (all other pok (0) = 0) 

• State 0, action 2: p Q0 (2) = P (D > 2) = 3/5, p 01 (2) = P (D = 1) = 1/5, etc. 

• State 2, action 2: p 2 j (k) = 1/5, k = 0, 1, 2, 3, 4 

For state = i, action = a, and demand = k, we seek g (i, a, k) 
g(0,0,k) = -40fc 

g (0, 2, k) = -300 + 200min{k, 2} - AQmax{k - 2, 0} -300 
5 (0,4,fc) = -480 + 200fc -480 

1. Complete the transition probability table and the gain table. 

2. Determine an optimum infinite-horizon strategy with no discounting. 

3. Determine an optimum infinite-horizon strateby with discounting (alpha = 1/1.02). 

4. The manager decides to set up a six-week strategy, after which new sales conditions may be 
established. Determine an optimum strategy for the six-week period. 

Data file 

7, file orderdata.m 

'/. Version of 4/5/94 

°/. Data organized for computation 

type = 3; 

states = 0:4; 

A = [0 2 4 . . . */. Actions (padded) 



-40 


-80 


-120 


-160 


100 


100 


60 


20 


280 


-80 


120 


320 



2 This content is available online at <http://cnx.Org/content/m31065/l.3/>. 
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2 02 . . . 

2 02 . . . 

00 00 . . . 

00 00] ; 

C = [0 -300 -480 ... '/. Order costs (padded) 

-300 -300 . . . 

-300 -300 . . . 

. . . 

0] ; 
SP = 200; '/. Selling price 

BP = 40; '/. Backorder penalty 

PD = 0. 2*ones(l,5) ; '/, Demand probabilities 



2,2,1 Transition Probabilities and Gains 

The procedure 

7, file reorder. m 
'/. Version of 4/11/94 

'/, Calculates PA and GA for reorder policy 
states = input ('Enter row vector of states '); 
A = input ('Enter row vector A of actions (padded) '); 
C = input ('Enter row vector C of order costs (padded) '); 
D = input ('Enter row vector D of demand values '); 
PD = input ('Enter row vector PD of demand probabilities '); 
SP = input ('Enter unit selling price SP '); 
BP = input ('Enter backorder penalty cost BP '); 
m = length(states' ) ; 
q = length (A); 
na = q/m; 
N = length (D) ; 
S = ones(na, 1) *states; 
S = S(:)'; 
[d,s] = meshgrid(D,S) ; 
a = A'*ones(l,N) ; 
ca = C'*ones(l,N) ; 

TA = (s + a - d) . *(s + a - d >= 0) ; 
for i = l:q 
PA(i,:) = tdbn(states,TA(i, :) ,PD) ; 
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end 

PA 

GA = ca + SP*d - (SP + BP)*(d -s -a).*(d > s+a) 

The calculations 

orderdata 
reorder 
Enter row vector of states states 
Enter row vector A of actions (padded) A 
Enter row vector C of order costs (padded) C 
Enter row vector D of demand values D 
Enter row vector PD of demand probabilities PD 
Enter unit selling price SP SP 
Enter backorder penalty cost BP BP 
PA = 
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GA = 



1.0000 






















0.6000 


0. 


.2000 





.2000 












0.2000 


0. 


,2000 





.2000 


0. 


,2000 


0. 


,2000 


0.8000 


0. 


.2000 

















0.4000 


0. 


.2000 





.2000 


0. 


,2000 







0.4000 


0. 


.2000 





.2000 


0. 


,2000 







0.6000 


0. 


,2000 





.2000 












0.2000 


0. 


.2000 





.2000 


0. 


,2000 


0. 


,2000 


0.2000 


0. 


.2000 





.2000 


0, 


,2000 


0. 


,2000 


0.4000 


0. 


.2000 





.2000 


0. 


,2000 







0.4000 


0. 


.2000 





.2000 


0. 


,2000 







0.4000 


0. 


.2000 





.2000 





,2000 







0.2000 


0. 


.2000 





.2000 


0. 


,2000 


0. 


,2000 


0.2000 


0. 


.2000 





.2000 


0. 


,2000 


0, 


,2000 


0.2000 


0. 


.2000 





.2000 


0. 


,2000 


0, 


,2000 







-40 




-80 




-120 




-160 


-300 




-100 




100 




60 




20 


-480 




-280 




-80 




120 




320 







200 




160 




120 




80 


-300 




-100 




100 




300 




260 


-300 




-100 




100 




300 




260 







200 




400 




360 




320 


-300 




-100 




100 




300 




500 


-300 




-100 




100 




300 




500 







200 




400 




600 




560 







200 




400 




600 




560 







200 




400 




600 




560 







200 




400 




600 




800 







200 




400 




600 




800 







200 




400 




600 




800 



2.2.2 Infinite-horizon strategy (no discounting) 
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polit 
Data needed: 



Enter type number to show gain type type 

Enter row vector of states states 

Enter row vector A of possible actions A 

Enter value of alpha (= 1 for no discounting) 1 

Enter matrix PA of transition probabilities PA 

Enter matrix GA of gains GA 

Enter row vector PD of demand probabilities PD 

Index Action Value 

-80 

-44 

-80 

112 
52 
52 

256 

100 

100 

352 

352 

352 

400 

400 

400 
Initial policy: action numbers 

2 1111 
Policy: actions 

2 



1 





2 


2 


3 


4 


4 





5 


2 


6 


2 


7 





8 


2 


9 


2 


10 





11 





12 





13 





14 





15 






New policy: action numbers 

3 2 2 1 
Policy: actions 

4 2 2 
Long-run distribution 

0.2800 0.2000 0.2000 
Test values for selecting new policy 
Index 
1.0000 
2.0000 
3.0000 
4.0000 
5.0000 
6.0000 
7.0000 
8.0000 
9.0000 
10.0000 



Action 


Test Value 





-248.0000 


2.0000 


-168.8000 


4.0000 


-41.6000 





-48.8000 


2.0000 


-5.6000 


2.0000 


-5.6000 





131.2000 


2.0000 


138.4000 


2.0000 


138.4000 





294.4000 



1 


0.2000 



0.1200 
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11.0000 







294.4000 


12.0000 







294.4000 


13.0000 







438.4000 


14.0000 







438.4000 


15.0000 







438.4000 


Optimum p 


olicy 






State 




Action 


Value 







4.0000 


-168.0000 


1.0000 




2.0000 


-132.0000 


2.0000 




2.0000 


12.0000 


3.0000 







168.0000 


4.0000 







312.0000 


Long-run expe 


cted g 


ain per 


period G 


126.4000 
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2.2.3 Infinite-horizon strategy (with discounting) 



polit 
Data needed: 



Enter case number to show gain type type 

Enter row vector of states states 

Enter row vector A of possible actions A 

Enter value of alpha (= 1 for no discounting) 1/1.02 

Enter matrix PA of transition probabilities PA 

Enter matrix GA of gains GA 

Enter row vector PD of demand probabilities PD 



Index 


Action 


Value 


1 





-80 


2 


2 


-44 


3 


4 


-80 


4 





112 


5 


2 


52 


6 


2 


52 


7 





256 


8 


2 


100 


9 


2 


100 


10 





352 


11 





352 


12 





352 


13 





400 


14 





400 


15 





400 



Initial policy: action numbers 

2 11 
Policy: actions 

2 
New policy: action numbers 

3 2 2 
Policy: actions 
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4 


2 




2 


Test values 


for select 


ing policy 


Index 




Action 


Test Value 


1.0e+03 


* 








0.0010 









6.0746 


0.0020 




0.0020 




6.1533 


0.0030 




0.0040 




6.2776 


0.0040 









6.2740 


0.0050 




0.0020 




6.3155 


0.0060 




0.0020 




6.3155 


0.0070 









6.4533 


0.0080 




0.0020 




6.4576 


0.0090 




0.0020 




6.4576 


0.0100 









6.6155 


0.0110 









6.6155 


0.0120 









6.6155 


0.0130 









6.7576 


0.0140 









6.7576 


0.0150 









6.7576 


Opt imum 


pol: 


icy 






State 




Action 


Value 


1.0e+03 


* 













0.0040 


6. 


,2776 


0.0010 




0.0020 


6. 


,3155 


0.0020 




0.0020 


6 


,4576 


0.0030 







6 


,6155 


0.0040 







6. 


,7576 



2.2.4 Finite-horizon calculations 



dpinit 
Initialize for finite horizon calculations 
Matrices A, PA, and GA, padded if necessary 
Enter type number to show gain type type 
Enter vector of states states 
Enter row vector A of possible actions A 
Enter matrix PA of transition probabilities 
Enter matrix GA of gains GA 
Enter row vector PD of demand probabilities 
Call for dprog 
dprog 
States and expected total gains 



PA 



PD 








1 


2 


3 


4 


-44 




112 


256 


352 


400 


States 


Act 


;ions 













2 








1 













2 













3 













4 













dprog 
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States 


and 


expected total gains 













1.0000 


2.0000 


3. 


,0000 


4.0000 


135.2000 


178.4000 


315.2000 


478. 


,4000 


615.2000 


States 


Actions 















4 










1 




2 










2 




2 










3 















4 















dprog 














States 


and 


expected total gains 













1.0000 


2.0000 


3. 


,0000 


4.0000 


264.4800 


300.4800 


444.4800 


600. 


,4800 


744.4800 


States 


Actions 















4 










1 




2 










2 




2 










3 















4 















dprog 














States 


and 


expected total gains 













1.0000 


2.0000 


3. 


,0000 


4.0000 


390.8800 


426.8800 


570.8800 


726, 


,8800 


870.8800 


States 


Actions 















4 










1 




2 










2 




2 










3 















4 















dprog 














States 


and 


expected total gains 













1.0000 


2.0000 


3. 


,0000 


4.0000 


517.2800 


553.2800 


697.2800 


853. 


,2800 


997.2800 


States 


Actions 















4 










1 




2 










2 




2 










3 















4 















dprog 














States 


and 


expected total gains 








l.Oe+Oc 


i * 














0.0010 0. 


.0020 




0.0030 


C 


1.6437 0.6797 0. 


.8237 




0.9797 


States 


Actions 















4 










1 




2 










2 




2 










3 















4 
















0.0040 
1.1237 
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2.3 MATLAB Calculations for Decision Models 3 
2.3.1 Data 

There are three types. In all types, we need the following: 

A = the vector of actions (1 x m) m = the number of actions 



)ns 




(1 x m) 


m 


~-Ui) 




(1X8) 


s 


Xj \H = 


= Ui) 


(s x q) 


q 



PH : PH (i) = P (H = u^ (1 x s) s = the number of values of H 

PXH : PXH(i,j)=P(X = Xj\H = Ui) (s X q) q = the number of values of X 

Type 1: The usual type. In addition to the above, we need 

L = [L (a, y k )] (m x n) m = the number of actions 

PYH : PYH (i, k) = P (Y = y k \H = Ui) (s x n) n = the number of values of Y 
Type 2: The matrix RH = [r (a, i)] is given. L and PYH are not needed. 
Type 3: Sometimes Y = H. In this case RH = L, which we need, in addition to the above. 



2.3.2 Calculated quantities 

1. RH = [r (a, i)} (m x s) [Risk function = expected loss, given H] r (a, i) = 
E [L (a, Y) \H = ml = J2 k L ( a > k) P (Y = y k \H = m) MATLAB: RH = L*PYH' 

2. PX (lxg) ' ^^(j) = P(X = x :7 -) = Y.i P { H = Ui) P(X=j\H = Ui ) MATLAB: 
PX = PH*PXH 

3. PHX (q x s) PHX(i,) = P(H = Uj \X = Xi) = 
P(X = Xi\H = Ui)P(H = Uj)/P(X = X t ) MATLAB: [a,b] = mesh- 
grid(PH,PX) PHX = PXH'.*a./b 

4. RX = [R (a, j)} (m x q) [Expected risk, given X] R (a, j) = E[r (a, H) \X = Xj] = 
J2i r (a, i)P(H = Ul \X = Xj) MATLAB: RX = RH*PHX' 

5. Select d from RX: d* (j) is the action a (row number) for minimum expected loss, given X = j. Set 
D=[d*(l), d*(2), ••• d*(q)}. 

6. Calculate the Bayesian risk BD for d*. BD = E[R (d* [X] , X)} = £\ RX [D (j) , j) PX (j) 
MATLAB: RD*PX' 

note: Actions are represented in calculations by action number (position in the matrix). In some 
cases, each action has a value other than its position number. The actual values can be presented 
in the final display. 

file deem 

°/,~f ile~dec .m 
'/„~Version~of "12/12/95 

disp('Decision~process~with~experimentation' ) 

disp ( ' There~are~three~types , ~according~to~the~data~provided . ' ) 
disp ( ' In~all~types , ~we~need~the~row~vector~A~of "actions , ' ) 
disp ( ' the~row~vector~PH~with~PH (i) ~=~P (H~=~u_i) , ' ) 
disp ('the~row~vect or ~X~of "test ~random~variable~ values, "and' ) 
disp('the~matrix~PXH~with~PXH(i, j)~=~P(X~=~x_j |H~=~u_i) . ') 
disp (' Type" 1 . ~~Loss~matrix~L~of ~L(a,k) ' ) 

dispC Matrix~PYH~with~PYH(i,k)~=~P(Y~=~y_k|H~=~u_i)') 

disp('Type~2.~~Matrix~RH~of~r(a,i)~=~E[L(a,Y) |H~=~u_i] . ') 



3 This content is available online at <http://cnx.Org/content/m31068/l.4/>. 
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disp(' L~and~PYH~are~not~needed~f or~this~type . ' ) 

disp('Type~3 . ~~Y~=~H, ~so~that~only~RH~=~L~is~needed. ' ) 

c =~input ( 'Enter~type~number~~ ' ) ; 

A =~input ( 'Enter~vector~A~of "actions" ' ) ; 

PH~~=~input ( 'Enter~vector~PH~of ~parameter~probabilities~~' ) ; 

PXH~=~input ( 'Enter~matrix~PXH~of ~conditional~probabilities~~ ' ) ; 

X =~input ( 'Enter~vector~X~of ~test~random~variable~values~~ ' ) ; 

s~=~length(PH) ; 

q~=~length(X) ; 

if~c~==~l 

~L =~input ( 'Enter~loss~matrix~L ' ) ; 

~PYH~=~ input ( 'Enter~matrix~PYH~of ~conditional~probabilities~~ ') ; 

~RH~~=~L*PYH'; 

elseif ~c~==~2 

~RH~~=~input ( 'Enter~matrix~RH~of ~expected~loss, ~given~H~~' ) ; 

else 

~L =~input ( 'Enter~loss~matrix~L ' ) ; 

~RH~~=~L; 

end 

PX~~~=~PH*PXH; °/„~(l~x~s)(s~x~q)~=~(l~x~q) 

[a,b]~=~meshgrid(PH,PX) ; 

PHX~=~PXH'.*a./b; 7„~(q~x~s) 

RX~~=~RH*PHX' ; °/.~(m~x~s) (s~x~q)~=~ (m~x~q) 

[RD~D] ~=~min(RX) ; y.~determines~min~of ~each~col 

°/ ~and.~z-ow~on~wliicli~min~occTiz-s 
S~=~[X;~A(D);~RD] '; 

BD~=~RD*PX' ; °/„~Bayesian~risk 

h~~=~ [' ~~Optimum~losses~and~actions'] ; 
sh~=~ [' ~ "Test "value" "Act ion Loss'] ; 

disp('~') 
disp(h) 
disp(sh) 
disp(S) 
disp('~') 
disp(['Bayesian~risk B(d*)~=~' ,num2str (BD) ,]) 

Example 2.3: General case 

7,~f ile~decl.m 
y„~Data~for~Problem~22-ll 
type~=~l; 

A~=~ [10~15] ; °/„~Artif icial~actions~list 

PH~=~ [0 . 3~0 . 2~0 . 5] ; ~~~7„~PH (i) ~=~P (H~=~i) 
PXH~=~ [0 . 7~0 . 2~0 . 1 ; ~~~°/.~PXH (i , j ) ~=~P (X~=~ j I H=~i) 

0.2~0.6~0.2; 

0.1~0.1~0.8] ; 

X~=~[-1~0~~1]; 

L~=~ [l~~0~-2; 7„~L(a,k) ~=~loss~when~action~number~is~a,~outcome~is~k 

3~-l~-4]; 

PYH~=~ [0 . 5~0 . 3~0 . 2 ; ~~~°/„~PYH (i , k) ~=~P (Y~=~k I H~=~i) 

0.2~0.5~0.3; 

0.1~0.3~0.6] ; 
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decl 
dec 

Decision~process~with~experimerLtation 

Th.ere~are~three~types , ~according~to~the~data~provided . 
In~all~types , ~we~need~the~row~vector~A~of "actions , 
the~row~vector~PH~with~PH(i)~=~P(H~=~i) , 
the~row~vector~X~of ~test~random~variable~values , "and 
the~matrix~PXH~with~PXH(i, j)~=~P(X~=~j |H~=~i) . 
Type~l . ~~Loss~matrix~L~of ~L(a,k) 

Matrix~PYH~with~PYH(i,k)~=~P(Y~=~k|H~=~i) 

Type~2.~~Matrix~RH~of~r(a,i)~=~E[L(a,Y) |H~=~i] . 

L~and~PYH~are~not "needed" in~this~ case . 
Type~3 . ~~Y~=~H,~so~that~only~RH~=~L~is~needed. 
Enter~type~number~~type 
Ent er~ vector ~A~of "act ions" A 

Ent er~ vector ~PH~of "parameter "probabilities PH 
Enter~matrix~PXH~of ~conditional~probabilities~~PXH 
Ent er~ vector ~X~of "test "random" variable" values" ~X 
Enter~loss~matrix~L~~L 
Enter~matrix~PYH~of ~conditional~probabilities~~PYH 



~ Opt imum~losses~and~act ions 
"Test "value" "Act ion Loss 
-~-l.0000~~~15.0000~~~-0.2667 

0~~~15.0000~~~-0.9913 

-~~1.0000~~~15.0000~~~-2.1106 



Bayesian~risk~~B(d*) ~=~-l . 3 

Intermediate steps in solution of Example 1, to show results of various operations 

RH 
RH~~=~~0.1000~~~-0.4000~~~-1.1000 

0.4000~~~-1.1000~~~-2.4000 

PX 

PX~~=~~0.3000 0.2300 0.4700 

a 

a~~~=~~0.3000 0.2000 0.5000 

0.3000 0.2000 0.5000 

0.3000 0.2000 0.5000 

b 

b~~~=~~0.3000 0.3000 0.3000 

0.2300 0.2300 0.2300 

0.4700 0.4700 0.4700 

PHX 

PHX~=~~0.7000 0.1333 0.1667 

0.2609 0.5217 0.2174 

0.0638 0.0851 0.8511 

RX 

RX~~=~-0.1667~~~-0.4217~~~-0.9638 
-0.2667~~~-0.9913~~~-2.1106 
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Example 2.4: RH given 

°/,~f ile~dec2.m~~ 
°/.~Data~f or~type~in~wh.ich~RH~is~given 
type~=~2; 
A~=~[l~2]; 
X"=- [-1-1-3]; 
PH~=~[0.2~0.5~0.3] ; 
PXH~=~ [0 . 5~0 . 4~0 . 1 ; ~~~'/.~PXH (i , j ) ~=~P (X~=~ j I H~=~i) 

0.4~0.5~0.1; 

0.2~0.4~0.4] ; 

RH~=~[-10~~~5~-12; 

5~-10~~-5] ; °/.~r (a, i) ~=~expected~loss~when 
o^ action~is~a,~given~H~=~i 

dec2 

dec 
Decision~process~with~experimentation 

~ Instruct ion~lines~edited~out 

Enter~type~iiumber~~type 

Exit er~ vector ~A~of~actions~ A 

Exit er~ vector ~PH~of "parameter "probabilities PH 

Enter~matrix~PXH~of ~conditioiial~probabilities~~PXH 

Exit er~ vector ~X~of~test~random~ variable" values" ~X 

Enter~matrix~RH~of ~expected~loss , ~given~H~~RH 

~Optimum~losses~and~actions 
~Test~value~~Action Loss 

~~-1.0000 2.0000~~~-5.0000 

~~~1.0000 2.0000~~~-6.0000 

~~~3.0000 1.0000~~~-7.3158 

Bayesian~risk B(d*) ~=~-5 . 89 

Example 2.5: Example 3 

Carnival example (type in which Y = H) 

A carnival is scheduled to appear on a given date. Profits to be earned depend heavily on the 
weather. If rainy, the carnival loses $15 (thousands); if cloudy, the loss is $5 (thousands); if sunny, 
a profit of $10 (thousands) is expected. If the carnival sets up its equipment, it must give the 
show; if it decides not to set up, it forfeits $1,000. For an additional cost of $1,000, it can delay 
setup until the day before the show and get the latest weather report. 
Actual weather H = Y is 1 rainy, 2 cloudy, or 3 sunny. 
The weather report X has values 1, 2, or 3, corresponding to predictions rainy, cloudy, or sunny 
respectively. 

Reliability of the forecast is expressed in terms of P (X = j\H = i)- see matrix PXH 

Two actions: 1 set up; 2 no set up. 

Possible losses for each action and weather condition are in matrix L. 

7,~f ile~dec3,m 
7,~Carnival~problem 
type~=~3 ; °/.~Y~=~H~~ (actual~weather) 
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A~=~ [1~~2] ; °/„~l : ~setup~~2 : ~no~setup 

X~=~ [1~~2~~3] ; 7„~1 ; "rain, ~~2 : "cloudy , ~3 : "sunny 

L~=~ [16~6~-9; °/,~L(a,k) ~=~loss~when~action~number~is~a,~outcome~is~k 
2~2~~2] ; ~ /.~--with~premium~f or~postponing~setup 

PH~=~0.1*[1~3~6] ; °/„~P(H~=~i) 

PXH~=~0.1*[7~2~1; '/ d ~PXH(i,j)~=~P(X~=~j |H~=~i) 

2~6~2; 

1~2~7]; 

dec3 

dec 

Decision~process~with~experimentation 

~ Instruct ion~lines~edited~out 

Enter~case~number~~case 

Ent er~ vector ~A~of "act ions" A 

Ent er~ vector ~PH~of "parameter "probabilities PH 

Enter~matrix~PXH~of ~conditional~probabilities~~PXH 

Ent er~ vector ~X~of "test "random" variable" values" ~X 

Enter~loss~matrix~L~~L 

"Opt imum~losses~and~act ions 



Test~value~ 


Action 


Loss 


""1.0000""" 


"2.0000"" 


""2.0000 


""2.0000""" 


"1.0000"" 


""1.0000 


""3.0000""" 


"1.0000"" 


"-6.6531 



Bayesian~risk~~B(d*) ~=~-2 . 56 



2.4 Matlab Procedures for Markov Decision Processes 4 

In order to provide the background for Matlab procedures for Markov decision processes, we first summarize 
certain essentials in the analysis of homogeneous Markov chains with costs and rewards associated with states, 
or with transitions between states. References are to Pfeiffer: PROBABILITY FOR APPLICATIONS. 

1. Some cost and reward patterns 

Consider a finite, ergodic (i.e., recurrent, positive, aperiodic, irreducible) homogeneous Markov chain 
Xjv, with state space E = {1, 2, • ■ • , M}. To facilitate use of matrix computation programs, we number 
states from one, except in certain examples in which state zero has a natural meaning. Associated with 
this chain is a cost or reward structure belonging to one of the three general classes described below. 
The one-step transition probability matrix is P = [p(i,j)}, where p(i,j) = P (X n+ i = j\X n = i). 
The distribution for X n is represented by the row matrix ir (n) = [pi (n) ,p 2 {n) , • • • ,pm (n)], where 
Pi (n) = P (X n = i). The long-run distribution is represented by the row matrix it = [m, W2, • • • , km}- 

a. Type 1. Gain associated with a state. A reward or gain in the amount g (i) is realized in 
the next period if the current state is i. The gain sequence{G n : 1 < n} of random variables 
Gn+i = 9 (X n ) is the sequence of gains realized as the chain Xjv evolves. We represent the gain 
function g by the column matrix g = \g (1) g (2) • • • g (M)} . 



This content is available online at <http://cnx.Org/content/m31095/l.7/>. 
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b. Type 2. One-step transition gains. A reward or gain in the amount g(i,j) = gij is realized 
in period n + 1 if the system goes from state i in period n to state j in period n + 1. The 
corresponding one-step transition probability is p(i,j) = Pij. The gain matrix is g = [g(i,j)]- 
The gain sequence{G n : 1 < n) of random variables G n +i = g (X n , X n+ i) is the sequence of 
gains experienced as the chain Xjv evolves. 

c. Type 3. Gains associated with a demand. In this case, the gain random variables are of the 
form 

G n+1 =g(X n ,D n+1 ) (2.1) 

If n represents the present, the random vector U n = (Xq,X\, ■ ■ ■ ,X n ) represents the "past" at 
n of the chain Xjv- We suppose {D n : 1 < n) is iid and each {D n+ i,U n } is independent. Again, 
the gain sequence{G„ : 1 < n} represents the gains realized as the process evolves. Standard 
results on Markov chains show that in each case the sequence Gn = {G n : 1 < n) is Markov. 

A recurrence relation. We are interested in the expected gains in future stages, given the present 
state of the process. For any n > and any m > 1, the gain G„ in the m periods following period n 
is given by 

Q\m) — Q n+1 -\- G n+ 2 + ■ ■ ■ + G n+m (2-2) 

If the process is in state i, the expected gain in the next period q,- is 

<h = v { l ] = E [G n+1 \X n = i] (2.3) 

and the expected gain in the next m periods is 

v\ m) =E[G^\X n = i] (2.4) 

We utilize a recursion relation that allows us to begin with the transition matrix P and the next-period 
expected gain matrixq = [qiq2 • • • q m ] and calculate the v) for any m > 1. Although the analysis is 
somewhat different for the various gain structures, the result exhibits a common pattern. In each case 



(m) 



m— i 

E [g^ ] \X n = i]=E [G n+1 \X n = i] + J2 E [Gn+k+i \X n = i] (2.5) 



fe=i 



Q t +2^2^E [G n+k+1 \X n+1 = j) p{i,j) (2.6) 



rn—1 

fe=i jeE 



jeE 



7 G n +k\X n 



fe=i 



P(i,j) (2-7) 



We thus have the fundamental recursion relation 



Mt>i m) = fc + X>? n - 1) P(M) (2-8) 

The recursion relation (*) shows that the transition matrix P = \p(i, j)] and the vector of next-period 

expected gains, which we represent by the column matrix q = [#1 , (?2 , • " " iQm] , determine the v \ f . 
The difference between the cases is the manner in which the q; are obtained. 



Type 1 
Type 2 
Type 3 



<ft = E [g (X n ) \X n = i}=g(i) 

q t = E[g(X n ,X n+1 ) \X n = i] = E[g(i,X n+1 ) \X n = i] = Y,j eE 9 (h j)p (*, j) 

qi = E[g (X n , D n+1 ) \X n = i]=E[g (i, D n+l )] = E fc 5 (*, k)P{D = k) 
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Matrix form: . For computational purposes, we utilize these relations in matrix form. If 



v (n) = \v{ v 



,(")„,(") 



v 



and q= [qiq 2 - ■ ■ QmY 



(2-9) 



then 



(*)v(m+ 1) = q + Pv(m) for all m > 0, with v (0) = (2.10) 

Examination of the expressions for q;, above, shows that the next-period expected gain matrix q takes 
the following forms. In each case, g is the gain matrix. In case c, pu is the column matrix whose 
elements are P (D = k). 

Type 1: q = g 

Type 2: q = the diagonal of Pg T 

Type 3: q = g Pr , 

2. Some long-run averages 

Consider the average expected gain for m periods 



E 



1 



-G 



(m) 



1 m 1 m 

— / , E [Gn+k] = — / _, E{E [G n+ k\X n -i]} 



fc=l fe=l 

Use of the Markov property and the fact that for an ergodic chain 



1 m 



as m — » oo 



fc=i 



yields the result that, 



lim E 

m — »oo 



-G (r ' 



Y^ P {X„-i =i)^2 qjTTj = XI 1i n J = 9 



(2.11) 



(2.12) 



(2.13) 



and for any state i, 



lim E 

m — >oo 



^-Gi^\X n = i 
m 



lim — v. 

m—>oom 



(m) 
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(2.14) 



The expression for g may be put into matrix form. If the long-run probability distribution is repre- 
sented by the row matrix 7r = \i^\i^2 • • ' tm] and q = \q\qi • • • ; qu\ , then 

ff = ^q (2.15) 



3. Discounting and potentials 

Suppose in a given time interval the value of money increases by a fraction a. This may represent 
the potential earning if the money were invested. One dollar now is worth 1 + a dollars at the end 
of one period. It is worth (1 + a) n dollars after n periods. Set a = 1/ (1 + a), so that < a < 1. 
It takes a dollars to be worth one dollar after one period. It takes a" dollars at present to be worth 
one dollar n periods later. Thus the present worth or discounted value of one dollar n periods in 
the future is a" dollars. This is the amount one would need to invest presently, at interest rate a per 
unit of time, to have one dollar n periods later. If f is any function defined on the state space E, we 
designate by f the column matrix [/ (1) / (2) • • • / (M)] . We make an exception to this convention in 
the case of the distributions of the state probabilities ir (n) = [pi (n) pi (n) ■ ■ -pu (n)], their generating 
function, and the long-run probabilities ir = [ttitt2- ■ -ttm], which we represent as row matrices. It 
should be clear that much of the following development extends immediately to infinite state space 
E = N = {0, 1, 2, • • • }. We assume one of the gain structures introduced in Sec 1 and the corresponding 
gain sequence {G n : 1 < n}. The value of the random variable G n +i is the gain or reward realized 
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during the period n + 1. We let each transition time be at the end of one period of time (say a month 
or a quarter). If n corresponds to the present, then G n +k ls the gain in period n + k. If we do not 
discount for the period immediately after the present, then a k ~ 1 G n+ k is the present value of that gain. 
Hence 



2_\ ot k 1 G n +k is the total discounted future gain 



fc=i 



Definition. The a-potential of the gain sequence {G n : 1 < n) is the function <j> defined by 



4>{%)=E 



2 , Qj"G n +i|.Xo 



,n=0 



VieE 



(2.16) 



(2.17) 



Remark. <fi (i) is the expected total discounted gain, given the system starts in state i. We next define 
a concept which is related to the a-potential in a useful way. Definition. The a-potential matrixR" 
for the process Xjv is the matrix 



R Q = Y] " np " with P° = I 



(2.18) 



n=0 



Theorem 2.1: 3.1 

Let Xjv be an ergodic, homogeneous Markov chain with state space E and gain sequence 
{G n ■ 1 < n}. Let q = [qiqi ■ ■ ■ qm] where qi = E [G n+ \\X n = i] fori G E. For a G (0, 1), let 



be the a-potential for the gain sequence. That is, 



<H*)=E 



y^a n G n+ i\X a 



n=0 



ViGE 



Then, if R" is the a-potential matrix for Xjv, we have 

(/> = R"q 
— D 



(2.19) 



(2.20) 



Theorem 2.2: 3.2 

Consider an ergodic, homogeneous Markov chain Xjv with gain sequence {G n : 1 < n) and 
next-period expected gain matrix q. For a G (0,1), the a-potential <j> an d the a-potential 
matrix R Q satisfy 

[I - aP] R Q = land [I - aP] <j> = q 



If the state space E is finite, then 

R" = [I - aP] _1 and0 = [I - aP] _1 q = R a q 

— D 

Example 2.6: A numerical example 

Suppose the transition matrix P and the next-period expected gain matrix q are 



(2.21) 

(2.22) 



0.2 0.3 0.5 
0.4 0.1 0.5 
0.6 0.3 0.1 



and q : 



(2.23) 
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For a = 0.9, we have 



R 



0.9 



(I-0.9P) 



4.4030 2.2881 3.3088 
3.5556 3.1356 3.3088 
3.6677 2.2881 4.0441 



and 



R 09 q : 



30.17 
32.72 
30.91 



(2.24) 



□ 



The next example is of class c, but the demand random variable is Poisson, which does not have finite 
range. The simple pattern for calculating the q; must be modified. 

Example 2.7: Costs associated with inventory under an (m, M) order policy. 

If k units are ordered, the cost in dollars is 



c(k) = { 







fc = 



10 + 25A; < k < M 



(2.25) 



For each unit of unsatisfied demand, there is a penalty of 50 dollars. We use the (m, M) 
inventory policy described in Ex 23.1.3. Xq = M, and X n is the stock at the end of period 
n, before restocking. Demand in period n is D n . The cost for restocking at the end of period 

n + 1 is 



g(X n ,D n+ i) = { 



10 + 25 (M - X n ) + 50max{D n+1 - M, 0} < X n < 



50max{D n+1 - X n , 0} m < X n < M 

For m = 1, M = 3, we have 

g (0, Ah-i) = 85 + 50/ [3 ,oo) (D n+ i) (D n+1 - 3) 



(2.26) 



(2.27) 



g (i, D n+1 ) = 50/ [3iOo) (D n+1 ) (D n+1 - i) 1 < i < 3 (2.28) 

We take m = 1, M = 3 and assume D is Poisson (1). From Ex 23.1.3 in PA, we obtain 



0.0803 0.1839 0.3679 0.3679 

0.6321 0.3679 

0.2642 0.3679 0.3679 

0.0803 0.1839 0.3679 0.3679 



(2.29) 



The largest eigenvalue |A| w 0.26, so n > 10 should be sufficient for convergence. We use 
n= 16. 
Taking any row of P 16 , we approximate the long-run distribution by 



7T = [0.2858 0.2847 0.2632 0.1663] 



(2.30) 



We thus have 
q = 85+50E [7(3,00) (D„+i) (D n+1 - 3)] = 85+50 J^ (k - 3)p fc ( the term for k = 3 is zero) (2.31) 



fe=4 

DC 



For the Poisson distribution J2T=n ^Pk = ^J2T=n~iPk- 



Hence q = 85 + 50 Efel 3 P*= ~ 3 Efel 4 Pfc] 



50 [0.0803 - 3 x 0.0190] = 86.1668 q t 
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50£r=3(*-l)Pfc = 50Er=2Pfc- 

50 [0.2642 - 2 x 0.0803] = 5.1809 q 3 



= q - 85 : 



= 50p 2 = 9.1970 q 2 = 50Er= 3 ( fc -2)w 
1.1668 so that 



86.1668 
9.1970 
5.1809 
1.1668 



(2.32) 



Then, for a = 0.8 we have 



R 



0.8 



(I - 0.8P)~ 



1.9608 1.0626 1.1589 0.8178 

1.4051 2.1785 0.8304 0.5860 

1.1733 1.2269 2.1105 0.4893 

0.9608 1.0626 1.1589 1.8178 



and 



R u *q 



185.68 

146.09 

123.89 

100.68 

(2.33) 



Recall that we are dealing with expected discounted costs. Since we usually consider starting 
with a full stock, the amount <f>(M) = <f> (3) = 100.68 is the quantity of interest. Note that 
this is smaller than other values of <j>(j), which correspond to smaller beginning stock levels. 
We expect greater restocking costs in such cases. 



4. Evolution of chains with costs and rewards 

a. No discounting A generating function analysis of 

(*) v (to + 1) = q + Pv (to) for all to > 0, withv(0) = 

shows 

v (n) = ngl + v + transients, where v = Boq 



(2.34) 



(2.35) 



Here g = 7rq, is a column matrix of ones, P = P°°, and B is a matrix which analysis also shows 
we may approximate by 



B„ = B(1)«I + P 



(n + 1)P 



(2.36) 



The largest |Aj| < 1 gives an indication of how many terms are needed. 

Example 2.8: The inventory problem (continued) 

We consider again the inventory problem. We have 



0.0803 0.1839 0.3679 0.3679 

0.6321 0.3679 

0.2642 0.3679 0.3679 

0.0803 0.1839 0.3679 0.3679 



and 



86.1668 
9.1970 
5.1819 
1.1668 



(2.37) 



The eigenvalues are 1,0.0920 + i0.2434, 0.0920 - 0.2434 and 0. Thus, the decay of the 
transients is controlled by |A*| = (0.0920 2 + 0.2434 2 ) 1/2 = 0.2602. Since |A*| 4 w 0.0046, 
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the transients die out rather rapidly. We obtain the following approximations 



>16 



0.2852 0.2847 0.2632 0.1663 

0.2852 0.2847 0.2632 0.1663 

0.2852 0.2847 0.2632 0.1663 

0.2852 0.2847 0.2632 0.1663 



so that tt w [0.2852 0.2847 0.2632 0.1663] 

(2.38) 



The approximate value of Bq is found to be 



5Po 



0.4834 -0.3766 -0.1242 0.0174 

0.0299 0.7537 -0.5404 -0.2432 

-0.2307 -0.1684 0.7980 -0.3989 

-0.5166 -0.3766 -0.1242 1.0174 



(2.39) 



The value g = 7rq w 28.80 is found in the earlier treatment. From the values above, we 
have 



v = B q 



37.6 


6.4 


-17.8 


-47.4 



so that v (n) 



28.80n + 37.6 
28.80n + 6.4 
28.80n - 17.8 
28.80n - 47.4 



transients 



(2.40) 



The average gain per period is clearly g w 28.80. This soon dominates the constant 
terms represented by the entries in v. 

b. With discounting Let the discount factor be a € (0, 1). If n represents the present period, then 
G n+ i = the reward in the first succeeding period G n+ k = the reward in the icth succeding period. 
If we do not discount the first period, then 



(*n — G n+ i + aG n+ 2 + a G n+ s + ■ ■ ■ + a m G n+m — G n+ i + aG n+1 



(m-l) 



Thus 



M 

;| m) = E [gW \X n = i]= qi + aJ2p(hJ)v 



(m-l) 
3 



3 = 1 



In matrix form, this is 



v (n) = q + aPv (n — 1) 
A generating function analysis shows that 



,(«) 



Hence, the steady state part of 

M 



Vi + transients 1 < i < M 



M 



V 



In matrix form, 



(m) 



1i + a ^P(^J) v j is v i = Qi + oi'^2p{i,j)vjl<i< M 



3 = 1 



j=l 



v = q + aPv which yields v = [I — aP] q 



(2.41) 
(2.42) 
(2.43) 
(2.44) 

(2.45) 
(2.46) 
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Since the qi = E [G n +i|X„ = i] are known, we can solve for the v;. Also, since v™ 1 is the present 
value of expected gain m steps into the future, the v; represent the present value for an unlimited 
future, given that the process starts in state i. 

5. Stationary decision policies 

We suppose throughout this section that Xjv is ergodic, with finite state space E = {1, 2, • • • , M}. 
For each state i € E, there is a class A; of possible actions which may be taken when the process is 
in state i. A decision policy is a sequence of decision functions d\, di ■ ■ ■ such that 

The action at stage n is d n (X , Xi, ..., X n ) . (2.47) 

The action selected is in A; whenever X n = i. We consider a special class of decision policies. 
Stationary decision policy 

d n (Xq, Xi, ■ ■ ■ , X n ) = d(X n ) with d invariant with n (2.48) 

The possible actions depend upon the current state. That is d (X n ) e A; whenever X n = i. Analysis 
shows the Markov character of the process is preserved under stationary policies. Also, if E is finite and 
every stationary policy produces an irreducible chain, then an optimal stationary policy is optimal. We 
suppose each policy yields an ergodic chain. Since the transition probabilities from any state depend 
in part on the action taken when in that state, the long-run probabilities will depend upon the policy. 
In the no-discounting case, we want to determine a stationary policy that maximizes the gain g = 7rq 
for the corresponding long-run probabilities. We say "a stationary policy," since we do not rule out the 
possibility there may be more than one policy which maximizes the gain. In the discounting case, the 
goal is to maximize the steady state part v,- in the expressions 

v^ = Vi + transients 1 < i < M (2.49) 

In simple cases, with small state space and a small number of possible actions in each state, it may be 
feasible to obtain the gain or present values for each permissible policy and then select the optimum 
by direct comparison. However, for most cases, this is not an efficient approach. In the next two 
sections, we consider iterative procedures for step-by-step optimization which usually greatly reduces 
the computational burden. 

6. Policy iteration method- no discounting 

We develop an iterative procedure for finding an optimal stationary policy. The goal is to determine a 
stationary policy that maximizes the long run expected gain per period g. In the next section, we extend 
this procedure to the case where discounting is in effect. We assume that each policy yields an ergodic 
chain. Suppose we have established that under a given policy (1) v/ 1 ' = qi + ^ • p (i, j) w , where 

qi = E [G n +\\X n = i] In this case, the analysis in Sec 4 shows that for large n, (2) v\ n w ng + n, 
where g = ^i^ili We note that v,- and g depend upon the entire policy, while q; and p(i, j) depend 
on the individual actions a;. Using (1) and (2), we obtain 

ng + Vi = qt + ^p(i, j)[(n- 1) g + v j ] = q l + (n - 1) g + ^p(i, j)vj (2.50) 

i j 

From this we conclude (3) g + Vi = qi + ^2jP(i, j)vj for all i € E Suppose a policy d has been 
used. That is, action d(i) is taken whenever the process is in state i. To simplify writing, we drop 
the indication of the action and simply write p(i, j) for p t j (d(i)), etc. Associated with this policy, 
there is a gain g. We should like to determine whether or not this is the maximum possible gain, and 
if it is not, to find a policy which does give the maximum gain. The following two-phase procedure 
has been found to be effective. It is essentially the procedure developed originally by Ronald Howard, 
who pioneered the treatment of these problems. The new feature is the method of carrying out the 
value-determination step, utilizing the approximation method noted above. 
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Value-determination procedure We calculate g = J^ i iriqi = 7rq. As in Sec 4, we calculate 



v = B q « [I + P 



(s + 1) P ] q where P = UmP r 



(2.51) 



b. Policy-improvement procedure We suppose policy d has been used through period n. Then, 
by (3), above, 

9 + Vi = q t + ^p(i,j)vj (2.52) 

3 

We seek to improve policy d by selecting policy d , with d* (i) = a* k , to satisfy 

9* + ^PijVj = max{qi (a ik ) + ^Pij {ttik)vj : a ik e Ai}, 1 < i < M (2.53) 

3 3 

Remarks 

• In the procedure for selecting d , we use the "old" Vj. 

• It has been established that g* > g, with equality iff g is optimal. Since there is a finite number 
of policies, the procedure must converge to an optimum stationary policy in a finite number of 
steps. 

We implement this two step procedure with Matlab. To demonstrate the procedure, we consider the 
following 

Example 2.9: A numerical example 

A Markov decision process has three states: State space E = {1, 2, 3}. 
Actions: State 1: A x = {1, 2, 3} State 2: A 2 = {1, 2} State 3: A 3 = {1, 2} 
Transition probabilities and rewards are: 



Plj (1): [1/3 1/3 1/3] 


9ij (1): [1 3 4] 


Plj (2): [1/4 3/8 3/8] 


92, (2): [2 2 3] 


Plj (3): [1/3 1/3 1/3] 


93 j (3): [2 2 3] 






p 2j (1): [1/8 3/8 1/2] 


92 3 (1): [2 1 2] 


p 2j (2): [1/2 1/4 1/4] 


923 (2): [1 4 4] 






p 3j (1): [3/8 1/4 3/8] 


933 (1): P 3 3] 


p 3j (2): [1/8 1/4 5/8] 


933 (2): [3 2 2] 



Table 2.1 

Use the policy iteration method to determine the policy which gives the maximum gain g. 
A computational procedure utilizing Matlab We first put the data in an m-file. Since 
we have several cases, it is expedient to include the case number. This example belongs to 
type 2. Data in file md61.m 

type = 2 
PA = [1/3 1/3 1/3; 1/4 3/8 3/8; 1/3 1/3 1/3; 0; 

1/8 3/8 1/2; 1/2 1/4 1/4; 0; 

3/8 1/4 3/8; 1/8 1/4 5/8] 
GA=[134;223;223;000; '/. Zero rows are separators 
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111 


1 


4 4; 





0; 


2 3 3 


3 


2 2] 






A = [1 2 3 





12 1 


2] 




md61 










type = 2 










PA = 0.3333 


0.3333 




0.3333 


0.2500 




0.3750 




0.3750 


0.3333 




0.3333 




0.3333 















0.1250 




0.3750 




0.5000 


0.5000 




0.2500 




0.2500 















0.3750 




0.2500 




0.3750 


0.1250 




0.2500 




0.6250 


GA = 1 


3 


4 






2 


2 


3 






2 


2 


3 



















2 


1 


2 






1 


4 


4 



















2 


3 


3 






3 


2 


2 















Once the data are entered into Matlab by the call for file "md61," we make preparation for 
the policy-improvement step. The procedure is in the file newpolprep.m 

'/, file newpolprep.m 
'/„ version of 3/23/92 



disp( 
disp( 
disp( 
disp( 
disp( 
disp( 
disp( 
disp( 
disp( 



Data needed: ' ) 

Matrix PA of transition probabilities for states and actions') 
Matrix GA of gains for states and actions') 
Type number to identify GA matrix types') 

Type 1. Gains associated with a state') 

Type 2. One-step transition gains') 

Type 3. Gains associated with a demand') 
Row vector of actions') 
Value of alpha (= 1 for no discounting)') 



c = input ('Enter type number to show gain type '); 

a = input ('Enter value of alpha (= 1 for no discounting) '); 

PA = input('Enter matrix PA of transition probabilities '); 

GA = input ('Enter matrix GA of gains '); 

if c == 3 

PD = input('Enter row vector of demand probabilities '); 
end 
if c == 1 

QA = GA'; 
elseif c ==2 



QA = diag(PA*GA'); '/. (qxl) 
else 

QA = GA+PD'; 
end 
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A = input ( 'Enter row vector A of possible actions '); 
= length(PA(l,:)); 
= length (A) ; 

= input ('Enter n, the power of P 
= input ('Enter s, the power of P 



1 (lxq) 



m 

q 

n 

s 

QD = [(l:q)' A' QA] ; 

DIS = [' Index 

disp(DIS) 

disp(QD) 

if a == 1 

call = 'Call for newpol'; 
else 

call = 'Call for newpold' ; 
end 
disp(call) 



to approximate PO '); 
in the approximation of V '); 
7. First col is index; second is A; third is QA 
Action Value'] ; 



newpolprep 7. Call for preparatory program in file npolprep.m 
Data needed: 
Matrix PA of transition probabilities for states and actions 
Matrix GA of gains for states and actions 
Type number to identify GA matrix types 

Type 1. Gains associated with a state 

Type 2. One-step transition gains 

Type 3. Gains associated with a demand 
Row vector of actions 
Value of alpha (= 1 for no discounting) 



Enter s, the power of 
Index Action 



Enter type number to show gain type 2 

Enter value of alpha (=1 for no discounting) 1 

Enter matrix PA of transition probabilities PA 

Enter matrix GA of gains GA 

Enter row vector A of possible actions A 

Enter n, the power of P to approximate PO 16 

P in the approximation of V 

Value 

2.6667 

2.3750 

2.3333 



1.6250 

2.5000 



2.6250 

2.1250 



1. 


.0000 


1, 


,0000 


2. 


.0000 


2. 


,0000 


3. 


,0000 


3. 


,0000 


4. 


,0000 







5. 


,0000 


1. 


,0000 


6. 


,0000 


2 


,0000 


7, 


,0000 







8. 


,0000 


1, 


,0000 


9. 


,0000 


2 


,0000 



Call for newpol 
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The procedure has prompted for the procedure newpol (in file newpol.m) 

'/, file: newpol.m (without discounting) 
'/. version of 3/23/92 

d = input ('Enter policy as row matrix of indices '); 
D = A(d); '/, policy in terms of actions 

P = PA(d',:); '/, selects probabilities for policy 

Q = QA(d',:); '/, selects next-period gains for policy 

P0 = P~n; '/, Display to check convergence 

PI = P0(1,:); '/, Long-run distribution 

G = PI*Q '/, Long-run expected gain per period 

C = P + eye(P) ; 
for j=2:s 

C = C + p-j ; '/. C = I + P + P"2 + . . . + P"s 

end 

V = (C - (s+l)*P0 )*Q; '/. B - B0*Q 
dispC ') 

disp( 'Approximation to P0; rows are long-run dbn') 
disp(PO) 

disp( 'Policy in terms of indices') 
disp(d) 

disp( 'Policy in terms of actions') 
disp(D) 

disp('Values for the policy selected') 
disp(V) 

disp( 'Long-run expected gain per period G') 
disp(G) 

T = [(l:q)' A' (QA + PA*V)] ; '/, Test values for determining new policy 
DIS =[' Index Action Test Value']; 
disp(DIS) 
disp(T) 

disp('Check current policy against new test values.') 
disp('--If new policy needed, call for newpol') 
disp('--If not, use policy, values V, and gain G, above') 

newpol 
Enter policy as row matrix of indices [2 6 9] '/, A deliberately poor choice 

Approximation to P0; rows are long-run dbn 

0.2642 0.2830 0.4528 

0.2642 0.2830 0.4528 

0.2642 0.2830 0.4528 

Policy in terms of indices 
2 6 9 

Policy in terms of actions 
2 2 2 

Long-run expected gain per period G 
2.2972 
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Index 


Action 


Test Value 


1.0000 


1.0000 


2.7171 


2.0000 


2.0000 


2.4168 


3.0000 


3.0000 


2.3838 


4.0000 








5.0000 


1.0000 


1.6220 


6.0000 


2.0000 


2.5677 


7.0000 








8.0000 


1.0000 


2.6479 


9.0000 


2.0000 


2.0583 



Check current policy against new test values. 

--If new policy needed, call for newpol 

--If not, use policy and gain G, above '/, New policy is needed 

newpol 

Enter policy as row matrix of indices [1 6 8] 

Approximation to P0; rows are long-run dbn 

0.3939 0.2828 0.3232 

0.3939 0.2828 0.3232 

0.3939 0.2828 0.3232 

Policy in terms of indices 
16 8 

Policy in terms of actions 
12 1 

Values for selected policy 
0.0526 
-0.0989 
0.0223 

Long-run expected gain per period G 
2.6061 



Index 


Action 


Test Value 


1.0000 


1.0000 


2.6587 


2.0000 


2.0000 


2.3595 


3.0000 


3.0000 


2.3254 


4.0000 








5.0000 


1.0000 


1.6057 


6.0000 


2.0000 


2.5072 


7.0000 








8.0000 


1.0000 


2.6284 


9.0000 


2.0000 


2.1208 



Check current policy against new test values. 

--If new policy needed, call for newpol 

--If not, use policy, values V, and gain G, above 
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Since the policy selected on this iteration is the same as the previous one, the procedure has 
converged to an optimal policy. The first of the first three rows is maximum; the second of 
the next two rows is maximum; and the first of the final two rows is maximum. Thus, we 
have selected rows 1, 5, 6, corresponding to the optimal policy d* ~ (1 2 1). The expected 
long-run gain per period g = 2.6061. 
The value matrix is 



"l 

"2 



*-':s 



0.0527 
-0.0989 
0.0223 



(2.54) 



We made a somewhat arbitrary choice of the powers of P used in the approximation of 
Po and Bo- As we note in the development of the approximation procedures in Sec 4, the 
convergence of P™ is governed by the magnitude of the largest eigenvalue other than one. We 
can always get a check on the reliability of the calculations by checking the eigenvalues for P 
corresponding to the presumed optimal policy. For the choice above, we find the eigenvalues 
to be 1, -0.1250, 0.0833. Since 0.125 4 w 0.0002, the choices of exponents should be quite 
satisfactory. In fact, we could probably use P 8 as a satisfactory approximation to P , if that 
were desirable. The margin allows for the possibility that for some policies the eigenvalues 
may not be so small. — □ 

7. Policy iteration- with discounting 

It turns out that the policy iteration procedure is simpler in the case of discounting, as suggested by 
the evolution analysis in Sec 4. We have the following two-phase procedure, based on that analysis. 

a. Value-determination procedure. Given the q; and transition probabilities p (i, j) for the 
current policy, solve v = q + aPv to get for the corresponding v,- 



v= [I-aP] _1 q 



(2.55) 



b. Policy-improvement procedure Given {vi : 1 < i < M} for the current policy, determine a 
new policy d , with each d* (i) = a^* determined as that action for which 



M 



M 



■^2p* ihj) 



3 = 1 



max{qi (a ik ) 

k 



y^Pij {aik)vja lk e Ai} 



(2.56) 



3 = 1 



We illustrate the Matlab procedure with the same numerical example as in the previous case, using 
discount factor a = 0.8. The data file is the same, so that we call for it, as before. 

Example 2.10 

Assume data in file md61.m is in Matlab; if not, call for md61. We use the procedure 
newpolprep to prepare for the iterative procedure by making some initial choices. 

newpolprep 
Data needed: 
Matrix PA of transition probabilities for states and actions 
Matrix GA of gains for states and actions 
Type number to identify GA matrix types 

Type 1. Gains associated with a state 

Type 2. One-step transition gains 

Type 3. Gains associated with a demand 
Row vector of actions 
Value of alpha (= 1 for no discounting) 
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Enter type number to show gain type 2 
Enter value of alpha (= 1 for no discounting) 0.8 
Enter matrix PA of transition probabilities PA 
Enter matrix GA of gains GA 
Enter row vector A of possible actions A 
Enter n, the power of P to approximate P0 16 
Enter s, the power of P in the approximation of V 



Index 


Action 


Test Value 


1.0000 


1.0000 


2.6667 


2.0000 


2.0000 


2.3750 


3.0000 


3.0000 


2.3333 


4.0000 








5.0000 


1.0000 


1.6250 


6.0000 


2.0000 


2.5000 


7.0000 








8.0000 


1.0000 


2.6250 


9.0000 


2.0000 


2.1250 



Call for newpold 

The procedure for policy iteration is in the file newpold. m. 

'/, file newpold. m (with discounting) 
'/. version of 3/23/92 

d = input ('Enter policy as row matrix of indices '); 
D = A(d); 

P = PA(d',:); '/, transition matrix for policy selected 
Q = QA(d',:); '/, average next-period gain for policy selected 

V = (eye(P) - a*P)\Q; '/, Present values for unlimited future 

T = [(l:q)' A' (QA + a*PA*V)] ; '/. Test values for determining new policy 

dispC ') 

disp( 'Approximation to P0; rows are long-run dbn') 

disp(PO) 

disp(' Policy in terms of indices') 

disp(d) 

disp(' Policy in terms of actions') 

disp(D) 

disp(' Values for policy') 

disp(V) 

DIS =[' Index Action Test Value']; 

disp(DIS) 

disp(T) 

disp('Check current policy against new test values.') 

disp('--If new policy needed, call for newpold') 

disp('--If not, use policy and values above') 

newpold 
Enter policy as row matrix of indices [3 5 9] '/, Deliberately poor choice 

Approximation to P0; rows are long-run dbn 
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0.3939 0.2828 0.3232 
0.3939 0.2828 0.3232 
0.3939 0.2828 0.3232 

Policy in terms of indices 
3 5 9 

Policy in terms of actions 
3 12 



alues for 


policy 




10.3778 






9.6167 






10.1722 






Index 


Action 


Test Value 


1.0000 


1.0000 


10.7111 


2.0000 


2.0000 


10.3872 


3.0000 


3.0000 


10.3778 


4.0000 








5.0000 


1.0000 


9.6167 


6.0000 


2.0000 


10.6089 


7.0000 








8.0000 


1.0000 


10.7133 


9.0000 


2.0000 


10.1722 



Check current policy against new test values. 
--If new policy needed, call for newpold 
--If not, use policy and values above 

newpold 

Enter policy as row matrix of indices [1 6 8] 

Approximation to P0; rows are long-run dbn 

0.3939 0.2828 0.3232 

0.3939 0.2828 0.3232 

0.3939 0.2828 0.3232 

Policy in terms of indices 
16 8 

Policy in terms of actions 

12 1 

Values for policy 
13.0844 
12.9302 
13.0519 



Index Action Test Value 
1.0000 1.0000 13.0844 
2.0000 2.0000 12.7865 
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3.0000 


3. 


,0000 


12, 


,7511 


4.0000 












5.0000 


1. 


,0000 


12. 


,0333 


6.0000 


2. 


,0000 


12. 


,9302 


7.0000 












8.0000 


1. 


,0000 


13. 


,0519 


9.0000 


2. 


,0000 


12. 


,5455 



Check current policy against new test values. 
--If new policy needed, call for newpold 
--If not, use policy and values above 

Since the policy indicated is the same as the previous policy, we know this is an optimal 
policy. Rows 1, 6, 8, indicate the optimal policy to be d* ~ (1, 2, 1). It turns out in this 
example that the optimal policies are the same for the discounted and undiscounted cases. 
That is not always true. The v matrix gives the present values for unlimited futures, for each 
of the three possible starting states. 
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Chapter 3 

Queues with Poisson Arrivals, 
Exponential Servers 1 



A standard model of a queueing system with a single waiting line and one or more servers assumes that 
"customers" arrive according to a Poisson process with rate (A). The customer at the head of the line goes 
to the first available server, if there are more than one, or to the single server as soon as available, if there is 
only one. The servers operate independently (of each other and the arrival process), each with exponential 
service time. We suppose each server has the same distribution, exponential (/i). Such a system may be 
analyzed as a Markov birth-death process. An analysis of the long-run probabilities and expectations of 
various quantities after the system has settled down to equilibrium yields the results below. 

Calculation of these quantities is straightforward, but somewhat tedious if various cases are considered. 
Matlab procedures for single-server and two-server systems are utilized to make these calculations quickly 
and to present them in a useful way. 

Notation 

• N t = number in system (in service and waiting) at time t 

• Qt = number waiting to be served at time t 

• -Kj = Urn Pij (t) = long-run probability of being in state j 

t — *oo 

• Wt = waiting time for service for customer who arrives at time t 

• D t = waiting time plus service time for customer who arrives at time t 

• A = random variable with distribution of interarrival times 

• S = random variable with distribution of service times 

Long-run probabilities ttj = P (N t = j) for large t, s servers, E [A] = 1/A, E [S] = l//i 
For s = 1, 

. P =E[S\/E[A] = \/fx 

• 7T = 1 - pn n = (1 - p) p n 

• Nt is approximately geometric (1 — p) 

For s > 1, 

• p= E[S] /sE [A] = X/sil 

ttq (sp) n jn\ = 7To (X/p) n /n\ < n < s 

• K n = { 

TT (s s /sl) p n = tt l(sp) s /sl} p n ~ s s < n 



1 This content is available online at <http://cnx.Org/content/m31072/l.3/>. 
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For s 


= 2 






• 


7T = 


1-P 

1+P 


_ 2p-A 
2p;+A 


For s 


= 3 






• 


7T = 


1- 


-P 


l+2p+§ p 2 


For s 


= 4 






• 


7Tn = 




1-P 
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For large t, with the system in equilibrium 

E [D t ] = E[A}E [N t ] and E [W t ] = E \A] E [Q t ] ^E[S]E [N t ] 



(3.1) 



For s = 1 

• E[N t 



:=: -P— = -A- 

• E [Q t ] = pE[N t ]P {N t > 0) = p 

Mil 

fx—X 

• D t is approximately exponential (fi — A) 



E [W t ] =E[S]E [N t ] 



For s > 1 



(SP) S 



C=P(W t >0)=7T O5 $ 



-P) 



£[Qt]^ 



P(W t > v) = Ce- ( ^ s - x >v > 

i_ e -[/»(»-i)-^]« 



• P{D t >v 

• P{D t >v 



1 + <V 



(spy 



fi(s-l)-X 
1 + AiCw] for A = /x - 1) 



:*/i(l-p)£[W t ] 

for A//i(s-l) 






sp 



3.1 Matlab calculations for single server queue (in file queuel.m) 

L = input ('Enter lambda '); '/, Type desired value, no extra space 



M = input ('Enter mu '); 

a = [' lambda mu'] ; 

b = [L M] ; 

disp(a) 

disp(b) 



r = L/M; '/. Rho 

EN = r/(l - r); '/, E[N] 

EQ = r*EN; '/. E[Q] 

EW = EQ/L; '/. E[W] 

ED = EN/L; '/. E[D] 



'/, Type desired value, no extra space 
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ED']; °/. Identifies entries in B 



A = [' rho EN EQ EW 

B = [r EN EQ EW ED] ; 

disp(A) 

disp(B) 



v = input ('Enter row matrix of values v '); '/, Type matrix of desired values 

PD = exp(-M*(l - r)*v); '/. Calculates P(Dt > v) 

S = [' v P(D>v)']; 

s = [v; PD]'; 

disp(S) 

disp(s) 

Example 3.1 



queue 1 

Enter lambda 0. 1 

Enter mu 0.2 

lambda mu 
0.1000 0.2000 

rho EN 
0.5000 1.0000 



EQ EW ED 
0.5000 5.0000 10.0000 



Enter row matrix of values v [8 16 24] 

v P(D>v) 

8.0000 0.4493 

16.0000 0.2019 

24.0000 0.0907 



3.2 Matlab calculations for two-server queue (in file queue2.m) 

JVote that the procedure will not calculate P (D > v) if A = /x. 

L = input ('Enter lambda '); '/, Type desired value, no extra space 
M = input ('Enter mu '); '/, Type desired value, no extra space 
a = [' lambda mu'] ; 
b = [L M] ; 
disp(a) 
disp(b) 

r = L/(2*M); 

EQ = (2*r~3)/(l - r~2); 

EN = EQ + 2*r; 

EW = EQ/L; 

ED = EN/L; 
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A = [' rho EN EQ EW ED']; '/. Identifies entries in B 

B = [r EN EQ EW ED] ; 

disp(A) 

disp(B) 

v = input ('Enter row matrix of values v '); 

t = 2*M*EW*(1 - r)/(l - 2*r); 

PD2 = exp(-M*v) . *(1 + t.*(l - exp(-M*v + L*v))); '/. Calculates P(D > v) for L not equal M 

S = [' v P(D>v)']; 

s = [v; PD2]'; 

disp(S) 

disp(s) 

Example 3.2 

queue 2 



Enter lambda 


0.1 


Enter mu 


0. 


2 


lambda 




mu 


0.1000 




0.2000 


rho 




EN 


0.2500 




0.5333 



EQ EW ED 

0.0333 0.3333 5.3333 



Enter row matrix of values v [4 8 16] 

v P(D>v) 

4.0000 0.4790 

8.0000 0.2241 

16.0000 0.0473 



3.3 Comparison of single-server and two-server queues 

A queueing system has Poisson arrivals, rate A and exponential (/i) service times. 

a. In system one, there is one server, with expected service time l//z = 1 minute. Determine 

[E [N] ,E[Q],E[W],E [D] , and P (D > v) , v = 1, 3, 5, 10(3.2) 

for expected arrival rates A = 0.6,0.9,0.99 customers per minute. 

b. In system two there are two servers, each with expected service time l//i = 2 minutes. Calculate the 
same quantities as for system one and compare the results for the two systems. 

queue 1 

Enter lambda 0.6 
Enter mu 1 

lambda mu 
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0.6000 



1.0000 



rho 


EN EQ 


EW 


ED 


0.6000 


1.5000 0.9000 


1.5000 


2.5000 


Enter row matrix of values v 


[13 5 10] 




V 


P(D>v) 






1.0000 


0.6703 






3.0000 


0.3012 






5.0000 


0.1353 






10.0000 


0.0183 






Ov = ones(l, 


length (v)) ; 






R = r*0v; 




'/, Row vecti 


ar with a 


rl = R; 








Ell = B; 








vll = PD; 








queue 1 








Enter lambda 


0.9 






Enter mu 1 








lambda 


mu 






0.9000 


1.0000 






rho 


EN EQ 


EW 


ED 


0.9000 


9.0000 8.1000 


9.0000 


10.0000 


Enter row matrix of values v 


v '/. Calls 


for prev 


V 


P(D>v) 






1.0000 


0.9048 






3.0000 


0.7408 






5.0000 


0.6065 






10.0000 


0.3679 






R = r*0v; 








r2 = R; 








E12 = B; 








vl2 = PD; 








queue 1 








Enter lambda 


0.99 






Enter mu 1 








lambda 


mu 






0.9900 


1.0000 






rho 


EN EQ 


EW 


ED 


0.9900 


99.0000 98.0100 


99.0000 


100.0000 



Enter row matrix of values v v 
v P(D>v) 
1.0000 0.9900 
3.0000 0.9704 
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5.0000 


0.9512 


10.0000 


0.9048 


R = r*0v; 




r3 = R; 




E13 = B; 




vl3 = PD; 





queue2 

Enter lambda 0.6 

Enter mu 0.5 
lambda mu 
0.6000 0.5000 



'/, Begin calculations for second system 



rho 




EN 


EQ EW ED 


0.6000 




1.8750 


0.6750 1.1250 3.1250 


Enter row 


matrix of 


values v v 


V 




P(D>v) 




1.0000 




0.7501 




3.0000 




0.3988 




5.0000 




0.2019 




10.0000 




0.0328 




E21 = B; 






'/, Not necessary to determ 


v21 = PD2; 






7, they are the same as fo: 


queue2 








Enter lambda 


0.9 




Enter mu 


0. 


5 




lambda 




mu 




0.9000 




0.5000 




rho 




EN 


EQ EW ED 


0.9000 




9.4737 


7.6737 8.5263 10.5263 


Enter row 


matrix of 


values v v 


V 




P(D>v) 




1.0000 




0.9245 




3.0000 




0.7749 




5.0000 




0.6410 




10.0000 




0.3916 




E22 = B; 








v22 = PD2; 








queue2 








Enter lambda 


0.99 




Enter mu 


0. 


5 




lambda 




mu 




0.9900 




0.5000 





rho 



EN 



EQ 



EW 



ED 
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0.9900 99.4975 97.5175 98.5025 100.5025 



Enter row matrix of values v 



V 


P(D>v) 








1.0000 


0.9920 








3.0000 


0.9743 








5.0000 


0.9557 








10.0000 


0.9094 








E23 = B; 










v23 = PD2; 










C = [Ell; 


E21; zeros 


(Ell); E12; 


E22; zeros(Ell); E13; E23] ; 


disp(A) 










rho 


EN 


EQ 


EW 


ED 


disp(C) 










0.6000 


1.5000 


0.9000 


1.5000 


2.5000 


0.6000 


1.8750 


0.6750 


1.1250 


3.1250 

















0.9000 


9.0000 


8.1000 


9.0000 


10.0000 


0.9000 


9.4737 


7.6737 


8.5263 


10.5263 

















0.9900 


99.0000 


98.0100 


99.0000 


100.0000 


0.9900 


99.4975 


97.5175 


98.5025 


100.5025 



°/. Zeros are spacers 



H = [' rho v P(Dl>v) P(D2>v)']; 
PDV = [rl r2 r3; v v v; vll vl2 vl3; v21 v22 v23] ' ; 



disp(H) 

rho 
disp(PDV) 
1.0000 


V 

1.0000 


P(Dl>v) 

0.6703 


P(D2>- 

0.7501 


1.0000 


3.0000 


0.3012 


0.3988 


1.0000 


5.0000 


0.1353 


0.2019 


1.0000 


10.0000 


0.0183 


0.0328 


0.9000 


1.0000 


0.9048 


0.9245 


0.9000 


3.0000 


0.7408 


0.7749 


0.9000 


5.0000 


0.6065 


0.6410 


0.9000 


10.0000 


0.3679 


0.3916 


0.9900 


1.0000 


0.9900 


0.9920 


0.9900 


3.0000 


0.9704 


0.9743 


0.9900 


5.0000 


0.9512 


0.9557 


0.9900 


10.0000 


0.9048 


0.9094 
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This collection includes some m-files for problems supplementary to Pfeiffer: Applied Probability. In addition, 
the text file collection New Prob mfiles contains both the user defined programs for that work and a collection 
of mfiles for specific problems with properly formatted data which can be entered into the workspace by 
calling the appropriate file. These m-files come from a variety of sources ( e.g., exams or problem sets, hence 
the odd names) and may be useful for examples and exercises. 
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